Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-15593

daf_persistence segfaults on Princeton tiger2 cluster

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Building daf_persistence on tiger2-sumire fails as follows:

      $ exec scl enable devtoolset-6 bash                                                                                                          
      $ bash newinstall.sh
       
        [elided]
       
      $ . loadLSST.bash 
      $ eups distrib install -t w_2018_34 daf_persistence
       
         [elided]
       
        [ 34/34 ]  daf_persistence 16.0-3-g3806c63+6 ... 
       
      ***** error: from /scratch/swinbank/stack_master/stack/miniconda3-4.5.4-fcd27eb/EupsBuildDir/Linux64/daf_persistence-16.0-3-g3806c63+6/build.log:
      Coverage.py warning: No data was collected. (no-data-collected)
      Global pytest run completed successfully
      Failed test output:
      tests/Persistence_3
       
      Running 1 test case...
       
      *** No errors detected
      tests/PropertySet_2
       
      Running 1 test case...
       
      *** No errors detected
      The following tests failed:
      /scratch/swinbank/stack_master/stack/miniconda3-4.5.4-fcd27eb/EupsBuildDir/Linux64/daf_persistence-16.0-3-g3806c63+6/daf_persistence-16.0-3-g3806c63+6/tests/.tests/Persistence_3.
      failed
      /scratch/swinbank/stack_master/stack/miniconda3-4.5.4-fcd27eb/EupsBuildDir/Linux64/daf_persistence-16.0-3-g3806c63+6/daf_persistence-16.0-3-g3806c63+6/tests/.tests/PropertySet_2.
      failed
      2 tests failed
      scons: *** [checkTestStatus] Error 1
      scons: building terminated because of errors.
      + exit -4
      eups distrib: Failed to build daf_persistence-16.0-3-g3806c63+6.eupspkg: Command:
              source "/scratch/swinbank/stack_master/eups/2.1.4/bin/setups.sh"; export EUPS_PATH="/scratch/swinbank/stack_master/stack/miniconda3-4.5.4-fcd27eb"; (/scratch/swinbank/sta
      ck_master/stack/miniconda3-4.5.4-fcd27eb/EupsBuildDir/Linux64/daf_persistence-16.0-3-g3806c63+6/build.sh) >> /scratch/swinbank/stack_master/stack/miniconda3-4.5.4-fcd27eb/EupsBui
      ldDir/Linux64/daf_persistence-16.0-3-g3806c63+6/build.log 2>&1 4>/scratch/swinbank/stack_master/stack/miniconda3-4.5.4-fcd27eb/EupsBuildDir/Linux64/daf_persistence-16.0-3-g3806c6
      3+6/build.msg 
      exited with code 252
      

      This means that the shared stack on Tiger is not currently being updated.

        Attachments

          Issue Links

            Activity

            Hide
            swinbank John Swinbank added a comment -

            Confirmed that w_2018_33 builds fine on Tiger2. That suggests this is related to the Boost upgrade in DM-15385.

            Show
            swinbank John Swinbank added a comment - Confirmed that w_2018_33 builds fine on Tiger2. That suggests this is related to the Boost upgrade in DM-15385 .
            Hide
            swinbank John Swinbank added a comment -

            Also that w_2018_34 fails on Perseus. But curiously works on lsst-dev01, which ought to be a very similar operating system, and is certainly the same (devtoolset-6) toolchain.

            Show
            swinbank John Swinbank added a comment - Also that w_2018_34 fails on Perseus. But curiously works on lsst-dev01 , which ought to be a very similar operating system, and is certainly the same (devtoolset-6) toolchain.
            Hide
            swinbank John Swinbank added a comment -

            Error is related to redirecting output to a file:

            [swinbank@tiger2-sumire daf_persistence ((w.2018.34))]$ ./tests/Persistence_3 
            Running 1 test case...
             
            *** No errors detected
             
            $ ./tests/Persistence_3 > log
             
            *** No errors detected
            Segmentation fault
            

            But not, interestingly enough, to another process:

            $ ./tests/Persistence_3 | cat
             
            *** No errors detected
            Running 1 test case...
            

            Show
            swinbank John Swinbank added a comment - Error is related to redirecting output to a file: [swinbank@tiger2-sumire daf_persistence ((w.2018.34))]$ ./tests/Persistence_3 Running 1 test case...   *** No errors detected   $ ./tests/Persistence_3 > log   *** No errors detected Segmentation fault But not, interestingly enough, to another process: $ ./tests/Persistence_3 | cat   *** No errors detected Running 1 test case...
            Hide
            swinbank John Swinbank added a comment -

            Error is coming from the guts of Boost:

            (gdb) set args > log
            (gdb) run
            Starting program: /scratch/swinbank/daf_persistence/tests/Persistence_3 > log
            Missing separate debuginfos, use: debuginfo-install glibc-2.17-222.el7.x86_64
            [Thread debugging using libthread_db enabled]
            Using host libthread_db library "/lib64/libthread_db.so.1".
             
            *** No errors detected
             
            Program received signal SIGSEGV, Segmentation fault.
            0x00002aaaaaff7cd5 in boost::serialization::typeid_system::extended_type_info_typeid_0::type_unregister() ()
               from /tigress/HSC/LSST/stack3_perseus_20171107/stack/miniconda3-4.3.21-10a4fa6/Linux64/boost/1.68/lib/libboost_serialization.so.1.68.0
            Missing separate debuginfos, use: debuginfo-install expat-2.1.0-10.el7_3.x86_64 libgcc-4.8.5-28.el7_5.1.x86_64 libgfortran-4.8.5-28.el7_5.1.x86_64 libicu-50.1.2-15.el7.x86_64 libquadmath-4.8.5-28.el7_5.1.x86_64 libstdc++-4.8.5-28.el7_5.1.x86_64 libuuid-2.23.2-52.el7.x86_64 nss-softokn-freebl-3.36.0-5.el7_5.x86_64 openblas-devel-0.2.20-6.sdl7.x86_64
            (gdb) bt
            #0  0x00002aaaaaff7cd5 in boost::serialization::typeid_system::extended_type_info_typeid_0::type_unregister() ()
               from /tigress/HSC/LSST/stack3_perseus_20171107/stack/miniconda3-4.3.21-10a4fa6/Linux64/boost/1.68/lib/libboost_serialization.so.1.68.0
            #1  0x0000000000410088 in boost::serialization::extended_type_info_typeid<lsst::daf::base::Persistable>::~extended_type_info_typeid (
                this=0x61bb80 <boost::serialization::singleton<boost::serialization::extended_type_info_typeid<lsst::daf::base::Persistable> >::get_instance()::t>, 
                __in_chrg=<optimized out>)
                at /tigress/HSC/LSST/stack3_perseus_20171107/stack/miniconda3-4.3.21-10a4fa6/Linux64/boost/1.68/include/boost/serialization/extended_type_info_typeid.hpp:96
            #2  boost::serialization::singleton<boost::serialization::extended_type_info_typeid<lsst::daf::base::Persistable> >::get_instance()::singleton_wrapper::~singleton_wrapper() (
                this=0x61bb80 <boost::serialization::singleton<boost::serialization::extended_type_info_typeid<lsst::daf::base::Persistable> >::get_instance()::t>, 
                __in_chrg=<optimized out>) at /tigress/HSC/LSST/stack3_perseus_20171107/stack/miniconda3-4.3.21-10a4fa6/Linux64/boost/1.68/include/boost/serialization/singleton.hpp:121
            #3  0x00002aaaad80cb69 in __run_exit_handlers () from /lib64/libc.so.6
            #4  0x00002aaaad80cbb7 in exit () from /lib64/libc.so.6
            #5  0x00002aaaad7f53dc in __libc_start_main () from /lib64/libc.so.6
            #6  0x0000000000409e3b in _start ()
            (gdb) 
            

            Show
            swinbank John Swinbank added a comment - Error is coming from the guts of Boost: (gdb) set args > log (gdb) run Starting program: /scratch/swinbank/daf_persistence/tests/Persistence_3 > log Missing separate debuginfos, use: debuginfo-install glibc-2.17-222.el7.x86_64 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1".   *** No errors detected   Program received signal SIGSEGV, Segmentation fault. 0x00002aaaaaff7cd5 in boost::serialization::typeid_system::extended_type_info_typeid_0::type_unregister() () from /tigress/HSC/LSST/stack3_perseus_20171107/stack/miniconda3-4.3.21-10a4fa6/Linux64/boost/1.68/lib/libboost_serialization.so.1.68.0 Missing separate debuginfos, use: debuginfo-install expat-2.1.0-10.el7_3.x86_64 libgcc-4.8.5-28.el7_5.1.x86_64 libgfortran-4.8.5-28.el7_5.1.x86_64 libicu-50.1.2-15.el7.x86_64 libquadmath-4.8.5-28.el7_5.1.x86_64 libstdc++-4.8.5-28.el7_5.1.x86_64 libuuid-2.23.2-52.el7.x86_64 nss-softokn-freebl-3.36.0-5.el7_5.x86_64 openblas-devel-0.2.20-6.sdl7.x86_64 (gdb) bt #0 0x00002aaaaaff7cd5 in boost::serialization::typeid_system::extended_type_info_typeid_0::type_unregister() () from /tigress/HSC/LSST/stack3_perseus_20171107/stack/miniconda3-4.3.21-10a4fa6/Linux64/boost/1.68/lib/libboost_serialization.so.1.68.0 #1 0x0000000000410088 in boost::serialization::extended_type_info_typeid<lsst::daf::base::Persistable>::~extended_type_info_typeid ( this=0x61bb80 <boost::serialization::singleton<boost::serialization::extended_type_info_typeid<lsst::daf::base::Persistable> >::get_instance()::t>, __in_chrg=<optimized out>) at /tigress/HSC/LSST/stack3_perseus_20171107/stack/miniconda3-4.3.21-10a4fa6/Linux64/boost/1.68/include/boost/serialization/extended_type_info_typeid.hpp:96 #2 boost::serialization::singleton<boost::serialization::extended_type_info_typeid<lsst::daf::base::Persistable> >::get_instance()::singleton_wrapper::~singleton_wrapper() ( this=0x61bb80 <boost::serialization::singleton<boost::serialization::extended_type_info_typeid<lsst::daf::base::Persistable> >::get_instance()::t>, __in_chrg=<optimized out>) at /tigress/HSC/LSST/stack3_perseus_20171107/stack/miniconda3-4.3.21-10a4fa6/Linux64/boost/1.68/include/boost/serialization/singleton.hpp:121 #3 0x00002aaaad80cb69 in __run_exit_handlers () from /lib64/libc.so.6 #4 0x00002aaaad80cbb7 in exit () from /lib64/libc.so.6 #5 0x00002aaaad7f53dc in __libc_start_main () from /lib64/libc.so.6 #6 0x0000000000409e3b in _start () (gdb)
            Hide
            tjenness Tim Jenness added a comment -

            There are 7 C++ tests in daf_persistence but only 2 of them failed. PropertySet_2 and Persistence_3 are the only two tests that include boost/serialization/export.hpp. Persistence_1 and Persistence_2 use the same serialization code without including the header file and they work. I don't suppose valgrind is showing any oddities between Persistence_1 that works and Persistence_3 that fails?

            Show
            tjenness Tim Jenness added a comment - There are 7 C++ tests in daf_persistence but only 2 of them failed. PropertySet_2 and Persistence_3 are the only two tests that include boost/serialization/export.hpp . Persistence_1 and Persistence_2 use the same serialization code without including the header file and they work. I don't suppose valgrind is showing any oddities between Persistence_1 that works and Persistence_3 that fails?
            Hide
            swinbank John Swinbank added a comment - - edited

            Valgrind was an interesting suggestion Tim Jenness. On the Princeton machines, where I'm getting segfaults, tests/Persistence_3 generates a lot of chatter from Valgrind:

            $ valgrind tests/Persistence_3
            ==235923== Memcheck, a memory error detector
            ==235923== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
            ==235923== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
            ==235923== Command: tests/Persistence_3
            ==235923==
            Running 1 test case...
             
            *** No errors detected
            ==235923== Invalid read of size 8
            ==235923==    at 0x515ECD1: boost::serialization::typeid_system::extended_type_info_typeid_0::type_unregister() (in /tigress/HSC/LSST/stack3_perseus_20171107/stack/miniconda3-4.3.21-10a4fa6/Linux64/boost/1.68/lib/libboost_serialization.so.1.68.0)
            ==235923==    by 0x410087: boost::serialization::singleton<boost::serialization::extended_type_info_typeid<lsst::daf::base::Persistable> >::get_instance()::singleton_wrapper::~singleton_wrapper() (extended_type_info_typeid.hpp:96)
            ==235923==    by 0x7973B68: __run_exit_handlers (in /usr/lib64/libc-2.17.so)
            ==235923==    by 0x7973BB6: exit (in /usr/lib64/libc-2.17.so)
            ==235923==    by 0x795C3DB: (below main) (in /usr/lib64/libc-2.17.so)
            ==235923==  Address 0xaf959b0 is 32 bytes inside a block of size 40 free'd
            ==235923==    at 0x4C2B16D: operator delete(void*) (vg_replace_malloc.c:576)
            ==235923==    by 0x515F2D7: boost::serialization::singleton<std::multiset<boost::serialization::typeid_system::extended_type_info_typeid_0 const*, boost::serialization::typeid_system::type_compare, std::allocator<boost::serialization::typeid_system::extended_type_info_typeid_0 const*> > >::get_instance()::singleton_wrapper::~singleton_wrapper() (in /tigress/HSC/LSST/stack3_perseus_20171107/stack/miniconda3-4.3.21-10a4fa6/Linux64/boost/1.68/lib/libboost_serialization.so.1.68.0)
            ==235923==    by 0x7973ED9: __cxa_finalize (in /usr/lib64/libc-2.17.so)
            ==235923==    by 0x5157402: ??? (in /tigress/HSC/LSST/stack3_perseus_20171107/stack/miniconda3-4.3.21-10a4fa6/Linux64/boost/1.68/lib/libboost_serialization.so.1.68.0)
            ==235923==    by 0x4010189: _dl_fini (in /usr/lib64/ld-2.17.so)
            ==235923==    by 0x7973B68: __run_exit_handlers (in /usr/lib64/libc-2.17.so)
            ==235923==    by 0x7973BB6: exit (in /usr/lib64/libc-2.17.so)
            ==235923==    by 0x795C3DB: (below main) (in /usr/lib64/libc-2.17.so)
            ==235923==  Block was alloc'd at
            ==235923==    at 0x4C2A1E3: operator new(unsigned long) (vg_replace_malloc.c:334)
            ==235923==    by 0x515EBEF: boost::serialization::typeid_system::extended_type_info_typeid_0::type_register(std::type_info const&) (in /tigress/HSC/LSST/stack3_perseus_20171107/stack/miniconda3-4.3.21-10a4fa6/Linux64/boost/1.68/lib/libboost_serialization.so.1.68.0)
            ==235923==    by 0x412EBA: extended_type_info_typeid (extended_type_info_typeid.hpp:91)
            ==235923==    by 0x412EBA: singleton_wrapper (singleton.hpp:121)
            ==235923==    by 0x412EBA: boost::serialization::singleton<boost::serialization::extended_type_info_typeid<lsst::daf::base::Persistable> >::get_instance() (singleton.hpp:122)
            ==235923==    by 0x4EF24B0: get_const_instance (singleton.hpp:148)
            ==235923==    by 0x4EF24B0: void_caster_primitive (void_cast.hpp:183)
            ==235923==    by 0x4EF24B0: singleton_wrapper (singleton.hpp:121)
            ==235923==    by 0x4EF24B0: boost::serialization::singleton<boost::serialization::void_cast_detail::void_caster_primitive<lsst::daf::base::PropertySet, lsst::daf::base::Persistable> >::get_instance() (singleton.hpp:122)
            ==235923==    by 0x4EC7E4F: __static_initialization_and_destruction_0 (singleton.hpp:162)
            ==235923==    by 0x4EC7E4F: _GLOBAL__sub_I_PropertySetFormatter.cc (PropertySetFormatter.cc:161)
            ==235923==    by 0x400FAC2: _dl_init (in /usr/lib64/ld-2.17.so)
            ==235923==    by 0x4001029: ??? (in /usr/lib64/ld-2.17.so)
             
             [....]
             
            ==235923==
            ==235923== HEAP SUMMARY:
            ==235923==     in use at exit: 5,173 bytes in 26 blocks
            ==235923==   total heap usage: 5,028 allocs, 5,003 frees, 540,101 bytes allocated
            ==235923==
            ==235923== LEAK SUMMARY:
            ==235923==    definitely lost: 0 bytes in 0 blocks
            ==235923==    indirectly lost: 0 bytes in 0 blocks
            ==235923==      possibly lost: 0 bytes in 0 blocks
            ==235923==    still reachable: 5,173 bytes in 26 blocks
            ==235923==                       of which reachable via heuristic:
            ==235923==                         stdstring          : 168 bytes in 4 blocks
            ==235923==         suppressed: 0 bytes in 0 blocks
            ==235923== Rerun with --leak-check=full to see details of leaked memory
            ==235923==
            ==235923== For counts of detected and suppressed errors, rerun with: -v
            ==235923== ERROR SUMMARY: 12 errors from 12 contexts (suppressed: 0 from 0)
            

            (A bunch of output elided in the middle there to keep Jira happy; full output at https://gist.github.com/jdswinbank/805f9c392a6ba8290e962f8501d52aac).

            However, so does tests/Persistence_1: I've not compared in detail, but they both report “12 errors from 12 contexts” (although Persistence_1 doesn't segfault).

            Dropping back to w_2018_33, both come up Valgrind-clean.

            However, the plot thickens: I see exactly the same thing on lsst-dev01. That is, on lsst-dev01, Valgrind reports a bunch of errors in the daf_persistence test suite for w_2018_34, although w_2018_33 was clean. This looks to me like a regression in Boost, and we're just getting lucky that we're not seeing more segfaults on other systems.

            I note that there are no changes advertised to boost_serialization between 1.66 and 1.68 (according to the release notes), so this seems like either a Boost bug or (possibly) a problem caused by us using private APIs which have been changed without an announcement being made (I've no idea if that's something we're doing: I've no expertise in boost_serialization, and not enough time to dive in and check).

            Show
            swinbank John Swinbank added a comment - - edited Valgrind was an interesting suggestion Tim Jenness . On the Princeton machines, where I'm getting segfaults, tests/Persistence_3 generates a lot of chatter from Valgrind: $ valgrind tests/Persistence_3 ==235923== Memcheck, a memory error detector ==235923== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==235923== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info ==235923== Command: tests/Persistence_3 ==235923== Running 1 test case...   *** No errors detected ==235923== Invalid read of size 8 ==235923== at 0x515ECD1: boost::serialization::typeid_system::extended_type_info_typeid_0::type_unregister() (in /tigress/HSC/LSST/stack3_perseus_20171107/stack/miniconda3-4.3.21-10a4fa6/Linux64/boost/1.68/lib/libboost_serialization.so.1.68.0) ==235923== by 0x410087: boost::serialization::singleton<boost::serialization::extended_type_info_typeid<lsst::daf::base::Persistable> >::get_instance()::singleton_wrapper::~singleton_wrapper() (extended_type_info_typeid.hpp:96) ==235923== by 0x7973B68: __run_exit_handlers (in /usr/lib64/libc-2.17.so) ==235923== by 0x7973BB6: exit (in /usr/lib64/libc-2.17.so) ==235923== by 0x795C3DB: (below main) (in /usr/lib64/libc-2.17.so) ==235923== Address 0xaf959b0 is 32 bytes inside a block of size 40 free'd ==235923== at 0x4C2B16D: operator delete(void*) (vg_replace_malloc.c:576) ==235923== by 0x515F2D7: boost::serialization::singleton<std::multiset<boost::serialization::typeid_system::extended_type_info_typeid_0 const*, boost::serialization::typeid_system::type_compare, std::allocator<boost::serialization::typeid_system::extended_type_info_typeid_0 const*> > >::get_instance()::singleton_wrapper::~singleton_wrapper() (in /tigress/HSC/LSST/stack3_perseus_20171107/stack/miniconda3-4.3.21-10a4fa6/Linux64/boost/1.68/lib/libboost_serialization.so.1.68.0) ==235923== by 0x7973ED9: __cxa_finalize (in /usr/lib64/libc-2.17.so) ==235923== by 0x5157402: ??? (in /tigress/HSC/LSST/stack3_perseus_20171107/stack/miniconda3-4.3.21-10a4fa6/Linux64/boost/1.68/lib/libboost_serialization.so.1.68.0) ==235923== by 0x4010189: _dl_fini (in /usr/lib64/ld-2.17.so) ==235923== by 0x7973B68: __run_exit_handlers (in /usr/lib64/libc-2.17.so) ==235923== by 0x7973BB6: exit (in /usr/lib64/libc-2.17.so) ==235923== by 0x795C3DB: (below main) (in /usr/lib64/libc-2.17.so) ==235923== Block was alloc'd at ==235923== at 0x4C2A1E3: operator new(unsigned long) (vg_replace_malloc.c:334) ==235923== by 0x515EBEF: boost::serialization::typeid_system::extended_type_info_typeid_0::type_register(std::type_info const&) (in /tigress/HSC/LSST/stack3_perseus_20171107/stack/miniconda3-4.3.21-10a4fa6/Linux64/boost/1.68/lib/libboost_serialization.so.1.68.0) ==235923== by 0x412EBA: extended_type_info_typeid (extended_type_info_typeid.hpp:91) ==235923== by 0x412EBA: singleton_wrapper (singleton.hpp:121) ==235923== by 0x412EBA: boost::serialization::singleton<boost::serialization::extended_type_info_typeid<lsst::daf::base::Persistable> >::get_instance() (singleton.hpp:122) ==235923== by 0x4EF24B0: get_const_instance (singleton.hpp:148) ==235923== by 0x4EF24B0: void_caster_primitive (void_cast.hpp:183) ==235923== by 0x4EF24B0: singleton_wrapper (singleton.hpp:121) ==235923== by 0x4EF24B0: boost::serialization::singleton<boost::serialization::void_cast_detail::void_caster_primitive<lsst::daf::base::PropertySet, lsst::daf::base::Persistable> >::get_instance() (singleton.hpp:122) ==235923== by 0x4EC7E4F: __static_initialization_and_destruction_0 (singleton.hpp:162) ==235923== by 0x4EC7E4F: _GLOBAL__sub_I_PropertySetFormatter.cc (PropertySetFormatter.cc:161) ==235923== by 0x400FAC2: _dl_init (in /usr/lib64/ld-2.17.so) ==235923== by 0x4001029: ??? (in /usr/lib64/ld-2.17.so)   [....]   ==235923== ==235923== HEAP SUMMARY: ==235923== in use at exit: 5,173 bytes in 26 blocks ==235923== total heap usage: 5,028 allocs, 5,003 frees, 540,101 bytes allocated ==235923== ==235923== LEAK SUMMARY: ==235923== definitely lost: 0 bytes in 0 blocks ==235923== indirectly lost: 0 bytes in 0 blocks ==235923== possibly lost: 0 bytes in 0 blocks ==235923== still reachable: 5,173 bytes in 26 blocks ==235923== of which reachable via heuristic: ==235923== stdstring : 168 bytes in 4 blocks ==235923== suppressed: 0 bytes in 0 blocks ==235923== Rerun with --leak-check=full to see details of leaked memory ==235923== ==235923== For counts of detected and suppressed errors, rerun with: -v ==235923== ERROR SUMMARY: 12 errors from 12 contexts (suppressed: 0 from 0) (A bunch of output elided in the middle there to keep Jira happy; full output at https://gist.github.com/jdswinbank/805f9c392a6ba8290e962f8501d52aac ). However, so does tests/Persistence_1 : I've not compared in detail, but they both report “12 errors from 12 contexts” (although Persistence_1 doesn't segfault). Dropping back to w_2018_33 , both come up Valgrind-clean. However, the plot thickens: I see exactly the same thing on lsst-dev01 . That is, on lsst-dev01 , Valgrind reports a bunch of errors in the daf_persistence test suite for w_2018_34 , although w_2018_33 was clean. This looks to me like a regression in Boost, and we're just getting lucky that we're not seeing more segfaults on other systems. I note that there are no changes advertised to boost_serialization between 1.66 and 1.68 (according to the release notes), so this seems like either a Boost bug or (possibly) a problem caused by us using private APIs which have been changed without an announcement being made (I've no idea if that's something we're doing: I've no expertise in boost_serialization, and not enough time to dive in and check).
            Hide
            swinbank John Swinbank added a comment -

            As Jim Bosch points out OOB, the failing code should be removed as part of the fallout from RFC-482. Given that, I propose to take no more action here, but will block this ticket on DM-14504 and hope that gets done soon (it won't be before w_2018_36, but if we're lucky it might be in that weekly).

            Show
            swinbank John Swinbank added a comment - As Jim Bosch points out OOB, the failing code should be removed as part of the fallout from RFC-482 . Given that, I propose to take no more action here, but will block this ticket on DM-14504 and hope that gets done soon (it won't be before w_2018_36 , but if we're lucky it might be in that weekly).
            Hide
            tjenness Tim Jenness added a comment -

            Tiago Ribeiro seems to be having the same problem but on CentOS7 with GCC7 compilers. John Swinbank what OS is Tiger running?

            Show
            tjenness Tim Jenness added a comment - Tiago Ribeiro seems to be having the same problem but on CentOS7 with GCC7 compilers. John Swinbank what OS is Tiger running?
            Hide
            tjenness Tim Jenness added a comment -

            Tiago Ribeiro reports that it built fine for him with the devtoolset-6 compilers.

            Show
            tjenness Tim Jenness added a comment - Tiago Ribeiro reports that it built fine for him with the devtoolset-6 compilers.
            Hide
            swinbank John Swinbank added a comment -

            [swinbank@tiger2-sumire ~]$ lsb_release -a
            LSB Version:    :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
            Distributor ID: Springdale
            Description:    Springdale Linux release 7.5 (Verona)
            Release:        7.5
            Codename:       Verona
            

            Springdale? Thanks Princeton. I believe that Springdale 7.5 is effectively a recompilation from scratch of the RHEL 7.5 source — it should be equivalent to, but not the same as, CentOS 7.5.

            Even on systems where this doesn't happen to segfault, Valgrind is still showing problems; switching toolchains might buy some temporary respite, but this code is still rotten and could spontaneously explode when we're not looking.

            Show
            swinbank John Swinbank added a comment - [swinbank@tiger2-sumire ~]$ lsb_release -a LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch Distributor ID: Springdale Description: Springdale Linux release 7.5 (Verona) Release: 7.5 Codename: Verona Springdale? Thanks Princeton. I believe that Springdale 7.5 is effectively a recompilation from scratch of the RHEL 7.5 source — it should be equivalent to, but not the same as, CentOS 7.5. Even on systems where this doesn't happen to segfault, Valgrind is still showing problems; switching toolchains might buy some temporary respite, but this code is still rotten and could spontaneously explode when we're not looking.
            Hide
            jbosch Jim Bosch added a comment -

            I believe all of the failing tests were removed on DM-15767.

            Show
            jbosch Jim Bosch added a comment - I believe all of the failing tests were removed on DM-15767 .
            Hide
            tjenness Tim Jenness added a comment -

            Looks like they are gone. Should we mark as INVALID?

            Show
            tjenness Tim Jenness added a comment - Looks like they are gone. Should we mark as INVALID?
            Hide
            swinbank John Swinbank added a comment -

            Let's check and confirm that we are now a) not segfaulting and b) Valgrind clean, then we can get rid of this ticket.

            Show
            swinbank John Swinbank added a comment - Let's check and confirm that we are now a) not segfaulting and b) Valgrind clean, then we can get rid of this ticket.
            Hide
            swinbank John Swinbank added a comment -

            Confirmed that w_2018_42 is now up & running on Tiger2. Of course, since the code no longer exists I can't actually check that it's Valgrind clean, but I think we can regard this as done.

            Show
            swinbank John Swinbank added a comment - Confirmed that w_2018_42 is now up & running on Tiger2. Of course, since the code no longer exists I can't actually check that it's Valgrind clean, but I think we can regard this as done.

              People

              Assignee:
              swinbank John Swinbank
              Reporter:
              swinbank John Swinbank
              Watchers:
              Jim Bosch, John Swinbank, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.