Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-8187

Qserv czar crashes itself and mysql-proxy on invalid queries

    Details

    • Type: Bug
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: Qserv
    • Labels:

      Description

      Problem summary

      Qserv mysql-proxy service always crashes on queries made on either non-existing databases or made in a lack of any specific database context. The same behavior is obsolved for queries addressed to databases which are present within the MySQL/MariaDB service of the Qserv master node while not being registered with Qserv's CSS.

      Examples of queries based on the integration test setup:

      SELECT COUNT(*) FROM AnyTable;
      SELECT COUNT(*) FROM UnknownDatabase.SomeTable;
      SELECT COUNT(*) FROM qservTest_case03_mysql.RunDeepSource;
      

      Details

      Once the crash happens no further details found in the service's log files (the report was made by logging into a running Docker container):

      [gapon@lsst-qserv-master01 ~] docker exec -it qserv bash
      qserv@lsst-qserv-master01:/qserv$ ls -al run/var/log/
      ..
      -rw-r--r-- 1 qserv qserv 2193695372 Nov  4 00:52 mysql-proxy-lua.log
      -rw-r----- 1 qserv qserv      11811 Nov  4 00:51 mysql-proxy.log
      

      The only (and the last) relevant record left in mysql-proxy-lua.log is about the query causing the crash. For example:

      % tail  mysql-proxy-lua.log
      ..[2016-11-04T01:27:40.883-0500] [LWP:1402] DEBUG ccontrol.UserQuerySelect (core/modules/ccontrol/UserQuerySelect.cc:397) - QI=227: UserQuery registered SELECT * FROM R LIMIT 1
      

      Another obstacle for investigating the root cause of the problem was that no core file was left by the crashed process. Further investigation has revealed that this was happening because the proxy is usually launched with the --daemon option (the report was taken from within a running Docker container):

      qserv@lsst-qserv-master01:/qserv$ ps -ef | grep proxy
      qserv     1540     0  0 01:44 ?        00:00:00 mysql-proxy --daemon --proxy-lua-script=…
      

      In order to get the core dump the following actions were taken. First of all the core configuration file of the container's host machine was modified to prefix core files with the name of the crashed executables:

      sudo -i
      echo "%e.core" > /proc/sys/kernel/core_pattern
      

      The next step was to ensure no limit for core dumps is set for user qserv within the Docker container of the Master image:

      [gapon@lsst-qserv-master01 ~] docker exec -it qserv bash
       
      qserv@lsst-qserv-master01:/qserv$ ulimit -c unlimited
      qserv@lsst-qserv-master01:/qserv$ ulimit -a
      core file size          (blocks, -c) unlimited
      ...
      

      The next step was to disable option --daemon in the service management file:

      /qserv/run/etc/init.d/mysql-proxy
      

      The new configuration was tested by stopping/starting the service from within the container:

      qserv@lsst-qserv-master01:/qserv$ run/etc/init.d/mysql-proxy stop
      [ ok ing mysql-proxy.
      qserv@lsst-qserv-master01:/qserv$ run/etc/init.d/mysql-proxy start
      [ ok ing mysql-proxy..
      

      The the following query was made to crash the service:

      SELECT * FROM R LIMIT 1
      

      After the service went down a desired core file was found in the following folder of the running container:

      qserv@lsst-qserv-master01:/qserv$ ls -al 
      ..
      -rw-------  1 qserv qserv 28422144 Nov  4 01:27 mysql-proxy.core.1402
      

      The dump was analyzed with gdb to get the stack of the crash:

      qserv@lsst-qserv-master01:/qserv$ which mysql-proxy          
      /qserv/stack/Linux64/mysqlproxy/0.8.5+12/bin/mysql-proxy
       
      qserv@lsst-qserv-master01:/qserv$ gdb `which mysql-proxy` mysql-proxy.core.1402
      Reading symbols from /qserv/stack/Linux64/mysqlproxy/0.8.5+12/bin/mysql-proxy...done.
      [New LWP 1402]
      [New LWP 1436]
      [New LWP 1437]
      [New LWP 1438]
      [Thread debugging using libthread_db enabled]
      Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
      Core was generated by `mysql-proxy --proxy-lua-script=/qserv/stack/Linux64/qserv/12.1.rc1-3-g72e15fd+3'.
      Program terminated with signal SIGSEGV, Segmentation fault.
      #0  0x00007f4bf1445573 in lsst::qserv::qdisp::Executive::setQueryId (this=0x0, id=227) at core/modules/qdisp/Executive.cc:103
      103
      core/modules/qdisp/Executive.cc: No such file or directory.
       
       
      (gdb) where
      #0  0x00007f4bf1445573 in lsst::qserv::qdisp::Executive::setQueryId (this=0x0, id=227) at core/modules/qdisp/Executive.cc:103
      #1  0x00007f4bf1293bc8 in lsst::qserv::ccontrol::UserQuerySelect::_qMetaRegister (this=0x180ea80) at core/modules/ccontrol/UserQuerySelect.cc:398
      #2  0x00007f4bf1290c1c in lsst::qserv::ccontrol::UserQuerySelect::UserQuerySelect (this=0x180ea80, qs=std::shared_ptr (count 2, weak 0) 0x1801870, 
          messageStore=std::shared_ptr (count 2, weak 0) 0x180efc0, executive=std::shared_ptr (empty) 0x0, infileMergerConfig=std::shared_ptr (empty) 0x0, 
          secondaryIndex=std::shared_ptr (count 2, weak 0) 0x179fa70, queryMetadata=std::shared_ptr (count 2, weak 0) 0x17c9da0, czarId=2, errorExtra="")
          at core/modules/ccontrol/UserQuerySelect.cc:151
      #3  0x00007f4bf12868b6 in __gnu_cxx::new_allocator<lsst::qserv::ccontrol::UserQuerySelect>::construct<lsst::qserv::ccontrol::UserQuerySelect<std::shared_ptr<lsst::qserv::qproc::QuerySession>&, std::shared_ptr<lsst::qserv::qdisp::MessageStore>&, std::shared_ptr<lsst::qserv::qdisp::Executive>&, std::shared_ptr<lsst::qserv::rproc::InfileMergerConfig>&, std::shared_ptr<lsst::qserv::qproc::SecondaryIndex>&, std::shared_ptr<lsst::qserv::qmeta::QMeta>&, unsigned int&, std::string&> > (this=0x7ffc21ece13f, __p=0x180ea80)
          at /usr/include/c++/4.9/ext/new_allocator.h:120
      #4  0x00007f4bf128607a in std::allocator_traits<std::allocator<lsst::qserv::ccontrol::UserQuerySelect> >::_S_construct<lsst::qserv::ccontrol::UserQuerySelect<std::shared_ptr<lsst::qserv::qproc::QuerySession>&, std::shared_ptr<lsst::qserv::qdisp::MessageStore>&, std::shared_ptr<lsst::qserv::qdisp::Executive>&, std::shared_ptr<lsst::qserv::rproc::InfileMergerConfig>&, std::shared_ptr<lsst::qserv::qproc::SecondaryIndex>&, std::shared_ptr<lsst::qserv::qmeta::QMeta>&, unsigned int&, std::string&> >(std::allocator<lsst::qserv::ccontrol::UserQuerySelect>&, std::allocator_traits<std::allocator<lsst::qserv::ccontrol::UserQuerySelect> >::__construct_helper*, (lsst::qserv::ccontrol::UserQuerySelect<std::shared_ptr<lsst::qserv::qproc::QuerySession>&, std::shared_ptr<lsst::qserv::qdisp::MessageStore>&, std::shared_ptr<lsst::qserv::qdisp::Executive>&, std::shared_ptr<lsst::qserv::rproc::InfileMergerConfig>&, std::shared_ptr<lsst::qserv::qproc::SecondaryIndex>&, std::shared_ptr<lsst::qserv::qmeta::QMeta>&, unsigned int&, std::string&>&&)...) (__a=..., 
          __p=0x180ea80) at /usr/include/c++/4.9/bits/alloc_traits.h:253
      #5  0x00007f4bf12858d4 in std::allocator_traits<std::allocator<lsst::qserv::ccontrol::UserQuerySelect> >::construct<lsst::qserv::ccontrol::UserQuerySelect<std::shared_ptr<lsst::qserv::qproc::QuerySession>&, std::shared_ptr<lsst::qserv::qdisp::MessageStore>&, std::shared_ptr<lsst::qserv::qdisp::Executive>&, std::shared_ptr<lsst::qserv::rproc::InfileMergerConfig>&, std::shared_ptr<lsst::qserv::qproc::SecondaryIndex>&, std::shared_ptr<lsst::qserv::qmeta::QMeta>&, unsigned int&, std::string&> >(std::allocator<lsst::qserv::ccontrol::UserQuerySelect>&, lsst::qserv::ccontrol::UserQuerySelect<std::shared_ptr<lsst::qserv::qproc::QuerySession>&, std::shared_ptr<lsst::qserv::qdisp::MessageStore>&, std::shared_ptr<lsst::qserv::qdisp::Executive>&, std::shared_ptr<lsst::qserv::rproc::InfileMergerConfig>&, std::shared_ptr<lsst::qserv::qproc::SecondaryIndex>&, std::shared_ptr<lsst::qserv::qmeta::QMeta>&, unsigned int&, std::string&>*, (lsst::qserv::ccontrol::UserQuerySelect<std::shared_ptr<lsst::qserv::qproc::QuerySession>&, std::shared_ptr<lsst::qserv::qdisp::MessageStore>&, std::shared_ptr<lsst::qserv::qdisp::Executive>&, std::shared_ptr<lsst::qserv::rproc::InfileMergerConfig>&, std::shared_ptr<lsst::qserv::qproc::SecondaryIndex>&, std::shared_ptr<lsst::qserv::qmeta::QMeta>&, unsigned int&, std::string&>&&)...) (__a=..., __p=0x180ea80) at /usr/include/c++/4.9/bits/alloc_traits.h:399
      #6  0x00007f4bf1284c74 in std::_Sp_counted_ptr_inplace<lsst::qserv::ccontrol::UserQuerySelect, std::allocator<lsst::qserv::ccontrol::UserQuerySelect>, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<std::shared_ptr<lsst::qserv::qproc::QuerySession>&, std::shared_ptr<lsst::qserv::qdisp::MessageStore>&, std::shared_ptr<lsst::qserv::qdisp::Executive>&, std::shared_ptr<lsst::qserv::rproc::InfileMergerConfig>&, std::shared_ptr<lsst::qserv::qproc::SecondaryIndex>&, std::shared_ptr<lsst::qserv::qmeta::QMeta>&, unsigned int&, std::string&> (this=0x180ea70, __a=...) at /usr/include/c++/4.9/bits/shared_ptr_base.h:515
      #7  0x00007f4bf1283f40 in __gnu_cxx::new_allocator<std::_Sp_counted_ptr_inplace<lsst::qserv::ccontrol::UserQuerySelect, std::allocator<lsst::qserv::ccontrol::UserQuerySelect>, (__gnu_cxx::_Lock_policy)2> >::construct<std::_Sp_counted_ptr_inplace<lsst::qserv::ccontrol::UserQuerySelect, std::allocator<lsst::qserv::ccontrol::UserQuerySelect>, (__gnu_cxx::_Lock_policy)2><std::allocator<lsst::qserv::ccontrol::UserQuerySelect> const, std::shared_ptr<lsst::qserv::qproc::QuerySession>&, std::shared_ptr<lsst::qserv::qdisp::MessageStore>&, std::shared_ptr<lsst::qserv::qdisp::Executive>&, std::shared_ptr<lsst::qserv::rproc::InfileMergerConfig>&, std::shared_ptr<lsst::qserv::qproc::SecondaryIndex>&, std::shared_ptr<lsst::qserv::qmeta::QMeta>&, unsigned int&, std::string&> > (this=0x7ffc21ece387, __p=0x180ea70) at /usr/include/c++/4.9/ext/new_allocator.h:120
      #8  0x00007f4bf1283427 in std::allocator_traits<std::allocator<std::_Sp_counted_ptr_inplace<lsst::qserv::ccontrol::UserQuerySelect, std::allocator<lsst::qserv::ccontrol::UserQuerySelect>, (__gnu_cxx::_Lock_policy)2> > >::_S_construct<std::_Sp_counted_ptr_inplace<lsst::qserv::ccontrol::UserQuerySelect, std::allocator<lsst::qserv::ccontrol::UserQuerySele---Type <return> to continue, or q <return> to quit---
      ct>, (__gnu_cxx::_Lock_policy)2><std::allocator<lsst::qserv::ccontrol::UserQuerySelect> const, std::shared_ptr<lsst::qserv::qproc::QuerySession>&, std::shared_ptr<lsst::qserv::qdisp::MessageStore>&, std::shared_ptr<lsst::qserv::qdisp::Executive>&, std::shared_ptr<lsst::qserv::rproc::InfileMergerConfig>&, std::shared_ptr<lsst::qserv::qproc::SecondaryIndex>&, std::shared_ptr<lsst::qserv::qmeta::QMeta>&, unsigned int&, std::string&> >(std::allocator<std::_Sp_counted_ptr_inplace<lsst::qserv::ccontrol::UserQuerySelect, std::allocator<lsst::qserv::ccontrol::UserQuerySelect>, (__gnu_cxx::_Lock_policy)2> >&, std::allocator_traits<std::allocator<std::_Sp_counted_ptr_inplace<lsst::qserv::ccontrol::UserQuerySelect, std::allocator<lsst::qserv::ccontrol::UserQuerySelect>, (__gnu_cxx::_Lock_policy)2> > >::__construct_helper*, (std::_Sp_counted_ptr_inplace<lsst::qserv::ccontrol::UserQuerySelect, std::allocator<lsst::qserv::ccontrol::UserQuerySelect>, (__gnu_cxx::_Lock_policy)2><std::allocator<lsst::qserv::ccontrol::UserQuerySelect> const, std::shared_ptr<lsst::qserv::qproc::QuerySession>&, std::shared_ptr<lsst::qserv::qdisp::MessageStore>&, std::shared_ptr<lsst::qserv::qdisp::Executive>&, std::shared_ptr<lsst::qserv::rproc::InfileMergerConfig>&, std::shared_ptr<lsst::qserv::qproc::SecondaryIndex>&, std::shared_ptr<lsst::qserv::qmeta::QMeta>&, unsigned int&, std::string&>&&)...) (__a=..., __p=0x180ea70)
          at /usr/include/c++/4.9/bits/alloc_traits.h:253
      #9  0x00007f4bf1282887 in std::allocator_traits<std::allocator<std::_Sp_counted_ptr_inplace<lsst::qserv::ccontrol::UserQuerySelect, std::allocator<lsst::qserv::ccontrol::UserQuerySelect>, (__gnu_cxx::_Lock_policy)2> > >::construct<std::_Sp_counted_ptr_inplace<lsst::qserv::ccontrol::UserQuerySelect, std::allocator<lsst::qserv::ccontrol::UserQuerySelect>, (__gnu_cxx::_Lock_policy)2><std::allocator<lsst::qserv::ccontrol::UserQuerySelect> const, std::shared_ptr<lsst::qserv::qproc::QuerySession>&, std::shared_ptr<lsst::qserv::qdisp::MessageStore>&, std::shared_ptr<lsst::qserv::qdisp::Executive>&, std::shared_ptr<lsst::qserv::rproc::InfileMergerConfig>&, std::shared_ptr<lsst::qserv::qproc::SecondaryIndex>&, std::shared_ptr<lsst::qserv::qmeta::QMeta>&, unsigned int&, std::string&> >(std::allocator<std::_Sp_counted_ptr_inplace<lsst::qserv::ccontrol::UserQuerySelect, std::allocator<lsst::qserv::ccontrol::UserQuerySelect>, (__gnu_cxx::_Lock_policy)2> >&, std::_Sp_counted_ptr_inplace<lsst::qserv::ccontrol::UserQuerySelect, std::allocator<lsst::qserv::ccontrol::UserQuerySelect>, (__gnu_cxx::_Lock_policy)2><std::allocator<lsst::qserv::ccontrol::UserQuerySelect> const, std::shared_ptr<lsst::qserv::qproc::QuerySession>&, std::shared_ptr<lsst::qserv::qdisp::MessageStore>&, std::shared_ptr<lsst::qserv::qdisp::Executive>&, std::shared_ptr<lsst::qserv::rproc::InfileMergerConfig>&, std::shared_ptr<lsst::qserv::qproc::SecondaryIndex>&, std::shared_ptr<lsst::qserv::qmeta::QMeta>&, unsigned int&, std::string&>*, (std::_Sp_counted_ptr_inplace<lsst::qserv::ccontrol::UserQuerySelect, std::allocator<lsst::qserv::ccontrol::UserQuerySelect>, (__gnu_cxx::_Lock_policy)2><std::allocator<lsst::qserv::ccontrol::UserQuerySelect> const, std::shared_ptr<lsst::qserv::qproc::QuerySession>&, std::shared_ptr<lsst::qserv::qdisp::MessageStore>&, std::shared_ptr<lsst::qserv::qdisp::Executive>&, std::shared_ptr<lsst::qserv::rproc::InfileMergerConfig>&, std::shared_ptr<lsst::qserv::qproc::SecondaryIndex>&, std::shared_ptr<lsst::qserv::qmeta::QMeta>&, unsigned int&, std::string&>&&)...) (__a=..., __p=0x180ea70)
          at /usr/include/c++/4.9/bits/alloc_traits.h:399
      #10 0x00007f4bf1281b27 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<lsst::qserv::ccontrol::UserQuerySelect, std::allocator<lsst::qserv::ccontrol::UserQuerySelect>, std::shared_ptr<lsst::qserv::qproc::QuerySession>&, std::shared_ptr<lsst::qserv::qdisp::MessageStore>&, std::shared_ptr<lsst::qserv::qdisp::Executive>&, std::shared_ptr<lsst::qserv::rproc::InfileMergerConfig>&, std::shared_ptr<lsst::qserv::qproc::SecondaryIndex>&, std::shared_ptr<lsst::qserv::qmeta::QMeta>&, unsigned int&, std::string&> (
          this=0x7ffc21ece928, __a=...) at /usr/include/c++/4.9/bits/shared_ptr_base.h:619
      #11 0x00007f4bf1280e15 in std::__shared_ptr<lsst::qserv::ccontrol::UserQuerySelect, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<lsst::qserv::ccontrol::UserQuerySelect>, std::shared_ptr<lsst::qserv::qproc::QuerySession>&, std::shared_ptr<lsst::qserv::qdisp::MessageStore>&, std::shared_ptr<lsst::qserv::qdisp::Executive>&, std::shared_ptr<lsst::qserv::rproc::InfileMergerConfig>&, std::shared_ptr<lsst::qserv::qproc::SecondaryIndex>&, std::shared_ptr<lsst::qserv::qmeta::QMeta>&, unsigned int&, std::string&> (
          this=0x7ffc21ece920, __tag=..., __a=...) at /usr/include/c++/4.9/bits/shared_ptr_base.h:1090
      #12 0x00007f4bf1280388 in std::shared_ptr<lsst::qserv::ccontrol::UserQuerySelect>::shared_ptr<std::allocator<lsst::qserv::ccontrol::UserQuerySelect>, std::shared_ptr<lsst::qserv::qproc::QuerySession>&, std::shared_ptr<lsst::qserv::qdisp::MessageStore>&, std::shared_ptr<lsst::qserv::qdisp::Executive>&, std::shared_ptr<lsst::qserv::rproc::InfileMergerConfig>&, std::shared_ptr<lsst::qserv::qproc::SecondaryIndex>&, std::shared_ptr<lsst::qserv::qmeta::QMeta>&, unsigned int&, std::string&> (this=0x7ffc21ece920, __tag=..., __a=...)
          at /usr/include/c++/4.9/bits/shared_ptr.h:316
      #13 0x00007f4bf127f778 in std::allocate_shared<lsst::qserv::ccontrol::UserQuerySelect, std::allocator<lsst::qserv::ccontrol::UserQuerySelect>, std::shared_ptr<lsst::qserv::qproc::QuerySession>&, std::shared_ptr<lsst::qserv::qdisp::MessageStore>&, std::shared_ptr<lsst::qserv::qdisp::Executive>&, std::shared_ptr<lsst::qserv::rproc::InfileMergerConfig>&, ---Type <return> to continue, or q <return> to quit--- 
      std::shared_ptr<lsst::qserv::qproc::SecondaryIndex>&, std::shared_ptr<lsst::qserv::qmeta::QMeta>&, unsigned int&, std::string&> (__a=...)
          at /usr/include/c++/4.9/bits/shared_ptr.h:588
      #14 0x00007f4bf127eafb in std::make_shared<lsst::qserv::ccontrol::UserQuerySelect, std::shared_ptr<lsst::qserv::qproc::QuerySession>&, std::shared_ptr<lsst::qserv::qdisp::MessageStore>&, std::shared_ptr<lsst::qserv::qdisp::Executive>&, std::shared_ptr<lsst::qserv::rproc::InfileMergerConfig>&, std::shared_ptr<lsst::qserv::qproc::SecondaryIndex>&, std::shared_ptr<lsst::qserv::qmeta::QMeta>&, unsigned int&, std::string&> () at /usr/include/c++/4.9/bits/shared_ptr.h:604
      #15 0x00007f4bf127cf8d in lsst::qserv::ccontrol::UserQueryFactory::newUserQuery (this=0x17c79a0, query="SELECT * FROM R LIMIT 1", defaultDb="")
          at core/modules/ccontrol/UserQueryFactory.cc:126
      Python Exception <type 'exceptions.ValueError'> Cannot find type const std::map<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >::_Rep_type: 
      #16 0x00007f4bf129f3aa in lsst::qserv::czar::Czar::submitQuery (this=0x17c6df0, query="SELECT * FROM R LIMIT 1", hints=std::map with 3 elements)
          at core/modules/czar/Czar.cc:125
      Python Exception <type 'exceptions.ValueError'> Cannot find type const std::map<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >::_Rep_type: 
      #17 0x00007f4bf17e778a in lsst::qserv::proxy::submitQuery (query="SELECT * FROM R LIMIT 1", hints=std::map with 3 elements) at core/modules/proxy/czarProxy.cc:102
      #18 0x00007f4bf17f5556 in _wrap_submitQuery (L=0x17703c0) at build/proxy/czarProxy_wrap.c++:4177
      #19 0x00007f4bf38e50c4 in luaD_precall () from /qserv/stack/Linux64/mysqlproxy/0.8.5+12/lib/libmysql-chassis.so.0
      #20 0x00007f4bf38e54a4 in luaD_call () from /qserv/stack/Linux64/mysqlproxy/0.8.5+12/lib/libmysql-chassis.so.0
      #21 0x00007f4bf38e487b in luaD_rawrunprotected () from /qserv/stack/Linux64/mysqlproxy/0.8.5+12/lib/libmysql-chassis.so.0
      #22 0x00007f4bf38e565b in luaD_pcall () from /qserv/stack/Linux64/mysqlproxy/0.8.5+12/lib/libmysql-chassis.so.0
      #23 0x00007f4bf38e2ebc in lua_pcall () from /qserv/stack/Linux64/mysqlproxy/0.8.5+12/lib/libmysql-chassis.so.0
      #24 0x00007f4bf38f32f8 in luaB_pcall () from /qserv/stack/Linux64/mysqlproxy/0.8.5+12/lib/libmysql-chassis.so.0
      #25 0x00007f4bf38e50c4 in luaD_precall () from /qserv/stack/Linux64/mysqlproxy/0.8.5+12/lib/libmysql-chassis.so.0
      #26 0x00007f4bf38ee26a in luaV_execute () from /qserv/stack/Linux64/mysqlproxy/0.8.5+12/lib/libmysql-chassis.so.0
      #27 0x00007f4bf38e54ed in luaD_call () from /qserv/stack/Linux64/mysqlproxy/0.8.5+12/lib/libmysql-chassis.so.0
      #28 0x00007f4bf38e487b in luaD_rawrunprotected () from /qserv/stack/Linux64/mysqlproxy/0.8.5+12/lib/libmysql-chassis.so.0
      #29 0x00007f4bf38e565b in luaD_pcall () from /qserv/stack/Linux64/mysqlproxy/0.8.5+12/lib/libmysql-chassis.so.0
      #30 0x00007f4bf38e2ebc in lua_pcall () from /qserv/stack/Linux64/mysqlproxy/0.8.5+12/lib/libmysql-chassis.so.0
      #31 0x00007f4bf1a08fa6 in proxy_lua_read_query (con=con@entry=0x176c470) at proxy-plugin.c:1227
      #32 0x00007f4bf1a09155 in proxy_read_query (chas=chas@entry=0x1758620, con=con@entry=0x176c470) at proxy-plugin.c:1334
      #33 0x00007f4bf36b4c4d in plugin_call (srv=0x1758620, con=0x176c470, state=<optimized out>) at network-mysqld.c:892
      #34 0x00007f4bf36b6283 in network_mysqld_con_handle (event_fd=11, events=2, user_data=0x176c470) at network-mysqld.c:1617
      #35 0x00007f4bf2958ed0 in event_process_active_single_queue (activeq=<optimized out>, base=<optimized out>) at event.c:1325
      #36 event_process_active (base=<optimized out>) at event.c:1392
      #37 event_base_loop (base=0x1769f40, flags=flags@entry=0) at event.c:1589
      #38 0x00007f4bf2959b87 in event_base_dispatch (event_base=<optimized out>) at event.c:1420
      #39 0x00007f4bf38dfa0a in chassis_event_thread_loop (event_thread=0x1769e90) at chassis-event-thread.c:466
      #40 0x00007f4bf38df496 in chassis_mainloop (_chas=0x1758620) at chassis-mainloop.c:359
      #41 0x0000000000402a09 in main_cmdline (argc=1, argv=0x7ffc21ed1d78) at mysql-proxy-cli.c:597
      #42 0x00007f4bf20a1b45 in __libc_start_main (main=0x401db0 <main>, argc=6, argv=0x7ffc21ed1d78, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
          stack_end=0x7ffc21ed1d68) at libc-start.c:287
      ---Type <return> to continue, or q <return> to quit---
      #43 0x0000000000401dde in _start ()
      

        Attachments

          Issue Links

            Activity

            Hide
            gapon Igor Gaponenko added a comment -

            The issues has been analysed and a newer version of the Docker container with a proper fix (as per DM-7380) deployed on the PDAC cluster. This has resolved the issue. Leaving this ticket as it may serve as a source of valuable information of how to recognize and diagnose this kind of problems in the production environment of PDAC.

            Many thanks to John Gates for the help with this problem!

            Show
            gapon Igor Gaponenko added a comment - The issues has been analysed and a newer version of the Docker container with a proper fix (as per DM-7380 ) deployed on the PDAC cluster. This has resolved the issue. Leaving this ticket as it may serve as a source of valuable information of how to recognize and diagnose this kind of problems in the production environment of PDAC. Many thanks to John Gates for the help with this problem!
            Hide
            jgates John Gates added a comment -

            I believe this was changed in DM-7380 so that it would check if the Executive was null before calling the function. This was a quick/basic fix in that what is really happening is that the constructor for UserQuerySelect is doing too much and ends up registering malformed queries with the metadata server. Registration should happen after construction only if the query is reasonably valid.

            Show
            jgates John Gates added a comment - I believe this was changed in DM-7380 so that it would check if the Executive was null before calling the function. This was a quick/basic fix in that what is really happening is that the constructor for UserQuerySelect is doing too much and ends up registering malformed queries with the metadata server. Registration should happen after construction only if the query is reasonably valid.
            Hide
            gapon Igor Gaponenko added a comment -

            (For the record, as this has already been discussed elswhere) Following recommendations made by @fjammes I checked out the latest version of qserv and put it at my home folder in lsst-qserv-master01:

            [gapon@lsst-qserv-master01 ~]$ ls -al /home/gapon/development/qserv/admin/tools/docker/deployment/ncsa/env.sh 
            -rwxr-xr-x 1 gapon grp_202 1126 Oct 27 13:35 /home/gapon/development/qserv/admin/tools/docker/deployment/ncsa/env.sh
             
            [gapon@lsst-qserv-master01 ~]$ cat /home/gapon/development/qserv/admin/tools/docker/deployment/ncsa/env.sh | grep BRANCH
            BRANCH=dev
            

            And these are containers which are available and running on the node:

            gapon@lsst-qserv-master01 ~]$ docker images
            REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
            qserv/qserv         dev_master          d662ad3687a3        6 weeks ago         3.295 GB
             
            [gapon@lsst-qserv-master01 ~]$ docker ps
            CONTAINER ID        IMAGE                    COMMAND                  CREATED             STATUS              PORTS               NAMES
            11220f48d73c        qserv/qserv:dev_master   "/bin/sh -c /qserv/sc"   20 hours ago        Up 20 hours                             qserv
            

            I think @fjammes was the one who installed that image.

            Show
            gapon Igor Gaponenko added a comment - (For the record, as this has already been discussed elswhere) Following recommendations made by @fjammes I checked out the latest version of qserv and put it at my home folder in lsst-qserv-master01 : [gapon@lsst-qserv-master01 ~]$ ls -al /home/gapon/development/qserv/admin/tools/docker/deployment/ncsa/env .sh -rwxr-xr-x 1 gapon grp_202 1126 Oct 27 13:35 /home/gapon/development/qserv/admin/tools/docker/deployment/ncsa/env .sh   [gapon@lsst-qserv-master01 ~]$ cat /home/gapon/development/qserv/admin/tools/docker/deployment/ncsa/env .sh | grep BRANCH BRANCH=dev And these are containers which are available and running on the node: gapon@lsst-qserv-master01 ~]$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE qserv /qserv dev_master d662ad3687a3 6 weeks ago 3.295 GB   [gapon@lsst-qserv-master01 ~]$ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 11220f48d73c qserv /qserv :dev_master "/bin/sh -c /qserv/sc" 20 hours ago Up 20 hours qserv I think @fjammes was the one who installed that image.
            Hide
            jgates John Gates added a comment -

            What version of the software is this using? I believe I put in a temporary fix for this on Sept 19 until we could pull abunch of stuff out of the UserQuerySelect constructor.

            Show
            jgates John Gates added a comment - What version of the software is this using? I believe I put in a temporary fix for this on Sept 19 until we could pull abunch of stuff out of the UserQuerySelect constructor.

              People

              • Assignee:
                gapon Igor Gaponenko
                Reporter:
                gapon Igor Gaponenko
                Watchers:
                Fritz Mueller, Igor Gaponenko, John Gates
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel