Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-26575

Worker crash in ~MySqlConfig

    XMLWordPrintable

    Details

    • Type: Story
    • Status: To Do
    • Resolution: Unresolved
    • Fix Version/s: None
    • Component/s: Qserv
    • Labels:
      None
    • Story Points:
      4
    • Sprint:
      DB_F20_09, DB_S21_12, DB_F21_06
    • Team:
      Data Access and Database
    • Urgent?:
      No

      Description

      Worker crash while dispatching 20 concurrent "medium" scan queries on the WISE dataset on the large Qserv cluster at NCSA. Running container versions qserv/qserv:tickets_DM-26207. Stack trace as follows:

      Program terminated with signal SIGSEGV, Segmentation fault.
      #0 0x00007f33d501a0f3 in __gnu_cxx::__exchange_and_add (__val=-1, __mem=0x7f33d5258358)
        at /qserv/stack/conda/miniconda3-py37_4.8.2/envs/lsst-scipipe-ceb6bb6/x86_64-conda_cos6-linux-gnu/include/c++/7.5.0/ext/atomicity.h:49
      49	 { return __atomic_fetch_add(__mem, __val, __ATOMIC_ACQ_REL); }
      [Current thread is 1 (Thread 0x7f334f7fe700 (LWP 448172))]
      (gdb) bt
      #0 0x00007f33d501a0f3 in __gnu_cxx::__exchange_and_add (__val=-1, __mem=0x7f33d5258358)
        at /qserv/stack/conda/miniconda3-py37_4.8.2/envs/lsst-scipipe-ceb6bb6/x86_64-conda_cos6-linux-gnu/include/c++/7.5.0/ext/atomicity.h:49
      #1 __gnu_cxx::__exchange_and_add_dispatch (__val=-1, __mem=0x7f33d5258358)
        at /qserv/stack/conda/miniconda3-py37_4.8.2/envs/lsst-scipipe-ceb6bb6/x86_64-conda_cos6-linux-gnu/include/c++/7.5.0/ext/atomicity.h:82
      #2 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7f33d5258350)
        at /qserv/stack/conda/miniconda3-py37_4.8.2/envs/lsst-scipipe-ceb6bb6/x86_64-conda_cos6-linux-gnu/include/c++/7.5.0/bits/shared_ptr_base.h:151
      #3 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7f3264719d20, __in_chrg=<optimized out>)
        at /qserv/stack/conda/miniconda3-py37_4.8.2/envs/lsst-scipipe-ceb6bb6/x86_64-conda_cos6-linux-gnu/include/c++/7.5.0/bits/shared_ptr_base.h:684
      #4 std::__shared_ptr<lsst::qserv::sql::SqlConnection, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (
        this=0x7f3264719d18, __in_chrg=<optimized out>)
        at /qserv/stack/conda/miniconda3-py37_4.8.2/envs/lsst-scipipe-ceb6bb6/x86_64-conda_cos6-linux-gnu/include/c++/7.5.0/bits/shared_ptr_base.h:1123
      #5 std::shared_ptr<lsst::qserv::sql::SqlConnection>::~shared_ptr (this=0x7f3264719d18, __in_chrg=<optimized out>)
        at /qserv/stack/conda/miniconda3-py37_4.8.2/envs/lsst-scipipe-ceb6bb6/x86_64-conda_cos6-linux-gnu/include/c++/7.5.0/bits/shared_ptr.h:93
      #6 lsst::qserv::mysql::MySqlConfig::~MySqlConfig (this=0x7f3264719c70, __in_chrg=<optimized out>)
        at core/modules/mysql/MySqlConfig.h:50
      #7 lsst::qserv::wdb::QueryRunner::~QueryRunner (this=0x7f3264719c10, __in_chrg=<optimized out>)
        at core/modules/wdb/QueryRunner.cc:537
      #8 0x00007f33d501a31b in lsst::qserv::wdb::QueryRunner::~QueryRunner (this=0x7f3264719c10, __in_chrg=<optimized out>)
        at core/modules/wdb/QueryRunner.cc:538
      #9 0x00007f33d5021885 in std::_Sp_counted_ptr<lsst::qserv::wdb::QueryRunner*, (__gnu_cxx::_Lock_policy)2>::_M_dispose
        (this=<optimized out>)
        at /qserv/stack/conda/miniconda3-py37_4.8.2/envs/lsst-scipipe-ceb6bb6/x86_64-conda_cos6-linux-gnu/include/c++/7.5.0/bits/shared_ptr_base.h:376
      #10 0x00007f33d50084ea in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7f3264675ba0)
        at /qserv/stack/conda/miniconda3-py37_4.8.2/envs/lsst-scipipe-ceb6bb6/x86_64-conda_cos6-linux-gnu/include/c++/7.5.0/bits/shared_ptr_base.h:154
      #11 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7f334f7fdbb8, __in_chrg=<optimized out>)
        at /qserv/stack/conda/miniconda3-py37_4.8.2/envs/lsst-scipipe-ceb6bb6/x86_64-conda_cos6-linux-gnu/include/c++/7.5.0/bits/shared_ptr_base.h:684
      #12 std::__shared_ptr<lsst::qserv::wdb::QueryRunner, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x7f334f7fdbb0, 
        __in_chrg=<optimized out>)
        at /qserv/stack/conda/miniconda3-py37_4.8.2/envs/lsst-scipipe-ceb6bb6/x86_64-conda_cos6-linux-gnu/include/c++/7.5.0/bits/shared_ptr_base.h:1123
      #13 std::shared_ptr<lsst::qserv::wdb::QueryRunner>::~shared_ptr (this=0x7f334f7fdbb0, __in_chrg=<optimized out>)
        at /qserv/stack/conda/miniconda3-py37_4.8.2/envs/lsst-scipipe-ceb6bb6/x86_64-conda_cos6-linux-gnu/include/c++/7.5.0/bits/shared_ptr.h:93
      #14 lsst::qserv::wcontrol::Foreman::<lambda(lsst::qserv::util::CmdData*)>::operator()(lsst::qserv::util::CmdData *) const (__closure=0x7f3344963b00) at core/modules/wcontrol/Foreman.cc:111
      #15 0x00007f33d500880d in std::_Function_handler<void(lsst::qserv::util::CmdData*), lsst::qserv::wcontrol::Foreman::processTask(const std::shared_ptr<lsst::qserv::wbase::Task>&)::<lambda(lsst::qserv::util::CmdData*)> >::_M_invoke(const std::_Any_data &, lsst::qserv::util::CmdData *&&) (__functor=..., __args#0=<optimized out>)
        at /qserv/stack/conda/miniconda3-py37_4.8.2/envs/lsst-scipipe-ceb6bb6/x86_64-conda_cos6-linux-gnu/include/c++/7.5.0/bits/std_function.h:316
      #16 0x00007f33d500541c in std::function<void (lsst::qserv::util::CmdData*)>::operator()(lsst::qserv::util::CmdData*) const (this=<optimized out>, __args#0=<optimized out>)
        at /qserv/stack/conda/miniconda3-py37_4.8.2/envs/lsst-scipipe-ceb6bb6/x86_64-conda_cos6-linux-gnu/include/c++/7.5.0/bits/std_function.h:706
      #17 0x00007f33d5005433 in lsst::qserv::util::Command::action (this=<optimized out>, data=<optimized out>)
        at core/modules/util/Command.h:77
      #18 0x00007f33d4f5524d in lsst::qserv::util::Command::runAction (data=0x7f32740aa290, this=<optimized out>)
        at core/modules/util/Command.h:81
      #19 lsst::qserv::util::EventThread::handleCmds (this=0x7f32740aa290) at core/modules/util/EventThread.cc:55
      #20 0x00007f33d4f577d6 in std::__invoke_impl<void, void (lsst::qserv::util::EventThread::*)(), lsst::qserv::util::EventThread*> (__t=<optimized out>, __f=<optimized out>)
        at /qserv/stack/conda/miniconda3-py37_4.8.2/envs/lsst-scipipe-ceb6bb6/x86_64-conda_cos6-linux-gnu/include/c++/7.5.0/bits/invoke.h:73
      #21 std::__invoke<void (lsst::qserv::util::EventThread::*)(), lsst::qserv::util::EventThread*> (__fn=<optimized out>)
        at /qserv/stack/conda/miniconda3-py37_4.8.2/envs/lsst-scipipe-ceb6bb6/x86_64-conda_cos6-linux-gnu/include/c++/7.5.0/bits/invoke.h:95
      #22 std::thread::_Invoker<std::tuple<void (lsst::qserv::util::EventThread::*)(), lsst::qserv::util::EventThread*> >::_M_invoke<0ul, 1ul> (this=<optimized out>)
        at /qserv/stack/conda/miniconda3-py37_4.8.2/envs/lsst-scipipe-ceb6bb6/x86_64-conda_cos6-linux-gnu/include/c++/7.5.0/thread:234
      #23 std::thread::_Invoker<std::tuple<void (lsst::qserv::util::EventThread::*)(), lsst::qserv::util::EventThread*> >::operator() (this=<optimized out>)
        at /qserv/stack/conda/miniconda3-py37_4.8.2/envs/lsst-scipipe-ceb6bb6/x86_64-conda_cos6-linux-gnu/include/c++/7.5.0/thread:243
      #24 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (lsst::qserv::util::EventThread::*)(), lsst::qserv::util::EventThread*> > >::_M_run (this=<optimized out>)
        at /qserv/stack/conda/miniconda3-py37_4.8.2/envs/lsst-scipipe-ceb6bb6/x86_64-conda_cos6-linux-gnu/include/c++/7.5.0/thread:186
      #25 0x00007f33e3730163 in std::execute_native_thread_routine (__p=0x7f327400a6b0)
        at /home/conda/feedstock_root/build_artifacts/ctng-compilers_1578638331887/work/.build/x86_64-conda_cos6-linux-gnu/src/gcc/libstdc++-v3/src/c++11/thread.cc:80
      #26 0x00007f33e37e3e65 in start_thread () from /lib64/libpthread.so.0
      #27 0x00007f33e339888d in clone () from /lib64/libc.so.6
      

        Attachments

          Activity

          Hide
          jgates John Gates added a comment -

          It looks like this may be caused by a default copy constructor for MySqlConfig and its `std::shared_ptr<sql::SqlConnection> _sqlConnection;` member variable.

          Show
          jgates John Gates added a comment - It looks like this may be caused by a default copy constructor for MySqlConfig and its `std::shared_ptr<sql::SqlConnection> _sqlConnection;` member variable.
          Hide
          jgates John Gates added a comment -

          The _sqlConnection member is completely unused, so removing it and its associated functions. All the other elements are acceptable for the default constructor, so explicitly defining it.

          Show
          jgates John Gates added a comment - The _sqlConnection member is completely unused, so removing it and its associated functions. All the other elements are acceptable for the default constructor, so explicitly defining it.
          Hide
          jgates John Gates added a comment -

          The changes were put into a separate ticket, DM-26579, and we will watch for segmentation faults associated with QueryRunner and MySqlConfig.

          Show
          jgates John Gates added a comment - The changes were put into a separate ticket, DM-26579 , and we will watch for segmentation faults associated with QueryRunner and MySqlConfig.

            People

            Assignee:
            jgates John Gates
            Reporter:
            fritzm Fritz Mueller
            Watchers:
            Fritz Mueller, John Gates
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated: