Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-8298

Invalid query causes an ANTLR and czar crashes

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      [2016-11-16T16:34:29.712-0600] [LWP:390] INFO  czar.Czar (core/modules/czar/Czar.cc:86) - New query: SELECT * FROM RunDeepSource WHERE qserv_areaspec_circle(coord_ra,coord_decl,9.469,-1.152,0.01)=1;, hints: [("client_dst_name", "141.142.181.128:4040"), ("db", "sdss_stripe82_00"), ("server_thread_id", "889")]
      [2016-11-16T16:34:29.750-0600] [LWP:390] DEBUG ccontrol.UserQueryType (core/modules/ccontrol/UserQueryType.cc:98) - isSelect: SELECT * FROM RunDeepSource WHERE qserv_areaspec_circle(coord_ra,coord_decl,9.469,-1.152,0.01)=1;
       
      [2016-11-16T16:34:29.750-0600] [LWP:390] DEBUG ccontrol.UserQueryType (core/modules/ccontrol/UserQueryType.cc:102) - isSelect: match
       
      [2016-11-16T16:34:29.753-0600] [LWP:390] ERROR ccontrol.UserQueryFactory (core/modules/ccontrol/UserQueryFactory.cc:113) - Invalid query: ParseException:Parse error(ANTLR):unexpected token: coord_ra:
      

        Attachments

          Activity

          Hide
          salnikov Andy Salnikov added a comment -

          PR, please?

          Show
          salnikov Andy Salnikov added a comment - PR, please?
          Hide
          gapon Igor Gaponenko added a comment -

          Pull request?

          Show
          gapon Igor Gaponenko added a comment - Pull request?
          Hide
          salnikov Andy Salnikov added a comment -

          John Gates, I'm not quite sure why this fix fixes it. Can you explain it a bit, did the crash happen because of uncaught exception or was it a SEGV?

          Show
          salnikov Andy Salnikov added a comment - John Gates , I'm not quite sure why this fix fixes it. Can you explain it a bit, did the crash happen because of uncaught exception or was it a SEGV?
          Hide
          jgates John Gates added a comment -

          What would happen before is that an invalid query would be passed into the constructor for UserQuerySelect and then call _qMetaRegister with some bad information, and it would blow up in _qMetaRegister. I didn't check what actually caused the problem in qMetaRegister, but I can see a few way things could go south with looking up non--existent tables and so forth. The change here only calls qMetaRegister if the query is reasonably valid, side stepping many possible errors, and is better in that I thought the constructor was doing too much anyway.

          Show
          jgates John Gates added a comment - What would happen before is that an invalid query would be passed into the constructor for UserQuerySelect and then call _qMetaRegister with some bad information, and it would blow up in _qMetaRegister. I didn't check what actually caused the problem in qMetaRegister, but I can see a few way things could go south with looking up non--existent tables and so forth. The change here only calls qMetaRegister if the query is reasonably valid, side stepping many possible errors, and is better in that I thought the constructor was doing too much anyway.
          Hide
          salnikov Andy Salnikov added a comment -

          OK, I disagree with "constructor doing too much" statement, but the whole UserQuerySelect construction process is a mess which leaves things in inconsistent state this is why we see this crash. And I'm not sure that this fix fixes all those issues, we'll need to review the whole thing at some point (I tried to improve it when I migrated czar to C++ but obviously I did not succeed).

          For the record, I dug a little bit to see where it crashes, here is gdb backtrace:

          Program received signal SIGSEGV, Segmentation fault.
          0x00007f29a00a2de4 in std::__shared_ptr<lsst::qserv::query::OrderByClause, (__gnu_cxx::_Lock_policy)2>::operator bool
              (this=0x30) at /usr/include/c++/4.8.2/bits/shared_ptr_base.h:923
          923	      { return _M_ptr == 0 ? false : true; }
           
          #0  0x00007f29a00a2de4 in std::__shared_ptr<lsst::qserv::query::OrderByClause, (__gnu_cxx::_Lock_policy)2>::operator bool (this=0x30) at /usr/include/c++/4.8.2/bits/shared_ptr_base.h:923
          #1  0x00007f29a00a2b12 in lsst::qserv::query::SelectStmt::hasOrderBy (this=0x0)
              at core/modules/query/SelectStmt.h:120
          #2  0x00007f29a01492a3 in lsst::qserv::qproc::QuerySession::getProxyOrderBy (this=0x19b9d38)
              at core/modules/qproc/QuerySession.cc:176
          #3  0x00007f299ff6caa8 in lsst::qserv::ccontrol::UserQuerySelect::_qMetaRegister (this=0x19cfb08)
              at core/modules/ccontrol/UserQuerySelect.cc:375
          #4  0x00007f299ff69e29 in lsst::qserv::ccontrol::UserQuerySelect::UserQuerySelect (this=0x19cfb08, 
              qs=std::shared_ptr (count 2, weak 0) 0x19b9d38, messageStore=std::shared_ptr (count 2, weak 0) 0x19b7f78, 
              executive=std::shared_ptr (empty) 0x0, infileMergerConfig=std::shared_ptr (empty) 0x0, 
              secondaryIndex=std::shared_ptr (count 2, weak 0) 0x198d2f8, 
              queryMetadata=std::shared_ptr (count 2, weak 0) 0x1963548, czarId=1, errorExtra="")
              at core/modules/ccontrol/UserQuerySelect.cc:151
          

          (basically QuerySession::getProxyOrderBy() SEGVs when session is in "error" state, which is wrong).

          I'm OK with the current fix if it helps PDAC to move forward, but we do need to improve that code a lot. Igor Gaponenko, please mark the ticket reviewed when you are satisfied.

          Show
          salnikov Andy Salnikov added a comment - OK, I disagree with "constructor doing too much" statement, but the whole UserQuerySelect construction process is a mess which leaves things in inconsistent state this is why we see this crash. And I'm not sure that this fix fixes all those issues, we'll need to review the whole thing at some point (I tried to improve it when I migrated czar to C++ but obviously I did not succeed). For the record, I dug a little bit to see where it crashes, here is gdb backtrace: Program received signal SIGSEGV, Segmentation fault. 0x00007f29a00a2de4 in std::__shared_ptr<lsst::qserv::query::OrderByClause, (__gnu_cxx::_Lock_policy)2>::operator bool (this=0x30) at /usr/include/c++/4.8.2/bits/shared_ptr_base.h:923 923 { return _M_ptr == 0 ? false : true; }   #0 0x00007f29a00a2de4 in std::__shared_ptr<lsst::qserv::query::OrderByClause, (__gnu_cxx::_Lock_policy)2>::operator bool (this=0x30) at /usr/include/c++/4.8.2/bits/shared_ptr_base.h:923 #1 0x00007f29a00a2b12 in lsst::qserv::query::SelectStmt::hasOrderBy (this=0x0) at core/modules/query/SelectStmt.h:120 #2 0x00007f29a01492a3 in lsst::qserv::qproc::QuerySession::getProxyOrderBy (this=0x19b9d38) at core/modules/qproc/QuerySession.cc:176 #3 0x00007f299ff6caa8 in lsst::qserv::ccontrol::UserQuerySelect::_qMetaRegister (this=0x19cfb08) at core/modules/ccontrol/UserQuerySelect.cc:375 #4 0x00007f299ff69e29 in lsst::qserv::ccontrol::UserQuerySelect::UserQuerySelect (this=0x19cfb08, qs=std::shared_ptr (count 2, weak 0) 0x19b9d38, messageStore=std::shared_ptr (count 2, weak 0) 0x19b7f78, executive=std::shared_ptr (empty) 0x0, infileMergerConfig=std::shared_ptr (empty) 0x0, secondaryIndex=std::shared_ptr (count 2, weak 0) 0x198d2f8, queryMetadata=std::shared_ptr (count 2, weak 0) 0x1963548, czarId=1, errorExtra="") at core/modules/ccontrol/UserQuerySelect.cc:151 (basically QuerySession::getProxyOrderBy() SEGVs when session is in "error" state, which is wrong). I'm OK with the current fix if it helps PDAC to move forward, but we do need to improve that code a lot. Igor Gaponenko , please mark the ticket reviewed when you are satisfied.
          Hide
          jgates John Gates added a comment -

          Maybe the constructor is doing too much is the wrong way to put it, but it was trying to do a lot things that are pointless when the query is already know to be invalid. This is the second time I've been in this code for a czar crash in the last 5 months, which is why I just moved it out of the constructor. We should discuss what really needs to happen with registration of invalid queries and what this code should do.

          Show
          jgates John Gates added a comment - Maybe the constructor is doing too much is the wrong way to put it, but it was trying to do a lot things that are pointless when the query is already know to be invalid. This is the second time I've been in this code for a czar crash in the last 5 months, which is why I just moved it out of the constructor. We should discuss what really needs to happen with registration of invalid queries and what this code should do.
          Hide
          gapon Igor Gaponenko added a comment -

          I've reviewed the changes, and they all make a sense to me.

          Show
          gapon Igor Gaponenko added a comment - I've reviewed the changes, and they all make a sense to me.

            People

            Assignee:
            jgates John Gates
            Reporter:
            jgates John Gates
            Reviewers:
            Andy Salnikov, Igor Gaponenko
            Watchers:
            Andy Salnikov, Gregory Dubois-Felsmann, Igor Gaponenko, John Gates
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: