Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-15593

daf_persistence segfaults on Princeton tiger2 cluster

    XMLWordPrintable

Details

    • Story
    • Status: Done
    • Resolution: Done
    • None
    • None
    • None

    Description

      Building daf_persistence on tiger2-sumire fails as follows:

      $ exec scl enable devtoolset-6 bash                                                                                                          
      $ bash newinstall.sh
       
        [elided]
       
      $ . loadLSST.bash 
      $ eups distrib install -t w_2018_34 daf_persistence
       
         [elided]
       
        [ 34/34 ]  daf_persistence 16.0-3-g3806c63+6 ... 
       
      ***** error: from /scratch/swinbank/stack_master/stack/miniconda3-4.5.4-fcd27eb/EupsBuildDir/Linux64/daf_persistence-16.0-3-g3806c63+6/build.log:
      Coverage.py warning: No data was collected. (no-data-collected)
      Global pytest run completed successfully
      Failed test output:
      tests/Persistence_3
       
      Running 1 test case...
       
      *** No errors detected
      tests/PropertySet_2
       
      Running 1 test case...
       
      *** No errors detected
      The following tests failed:
      /scratch/swinbank/stack_master/stack/miniconda3-4.5.4-fcd27eb/EupsBuildDir/Linux64/daf_persistence-16.0-3-g3806c63+6/daf_persistence-16.0-3-g3806c63+6/tests/.tests/Persistence_3.
      failed
      /scratch/swinbank/stack_master/stack/miniconda3-4.5.4-fcd27eb/EupsBuildDir/Linux64/daf_persistence-16.0-3-g3806c63+6/daf_persistence-16.0-3-g3806c63+6/tests/.tests/PropertySet_2.
      failed
      2 tests failed
      scons: *** [checkTestStatus] Error 1
      scons: building terminated because of errors.
      + exit -4
      eups distrib: Failed to build daf_persistence-16.0-3-g3806c63+6.eupspkg: Command:
              source "/scratch/swinbank/stack_master/eups/2.1.4/bin/setups.sh"; export EUPS_PATH="/scratch/swinbank/stack_master/stack/miniconda3-4.5.4-fcd27eb"; (/scratch/swinbank/sta
      ck_master/stack/miniconda3-4.5.4-fcd27eb/EupsBuildDir/Linux64/daf_persistence-16.0-3-g3806c63+6/build.sh) >> /scratch/swinbank/stack_master/stack/miniconda3-4.5.4-fcd27eb/EupsBui
      ldDir/Linux64/daf_persistence-16.0-3-g3806c63+6/build.log 2>&1 4>/scratch/swinbank/stack_master/stack/miniconda3-4.5.4-fcd27eb/EupsBuildDir/Linux64/daf_persistence-16.0-3-g3806c6
      3+6/build.msg 
      exited with code 252
      

      This means that the shared stack on Tiger is not currently being updated.

      Attachments

        Issue Links

          Activity

            Confirmed that w_2018_33 builds fine on Tiger2. That suggests this is related to the Boost upgrade in DM-15385.

            swinbank John Swinbank added a comment - Confirmed that w_2018_33 builds fine on Tiger2. That suggests this is related to the Boost upgrade in DM-15385 .

            Also that w_2018_34 fails on Perseus. But curiously works on lsst-dev01, which ought to be a very similar operating system, and is certainly the same (devtoolset-6) toolchain.

            swinbank John Swinbank added a comment - Also that w_2018_34 fails on Perseus. But curiously works on lsst-dev01 , which ought to be a very similar operating system, and is certainly the same (devtoolset-6) toolchain.

            Error is related to redirecting output to a file:

            [swinbank@tiger2-sumire daf_persistence ((w.2018.34))]$ ./tests/Persistence_3 
            Running 1 test case...
             
            *** No errors detected
             
            $ ./tests/Persistence_3 > log
             
            *** No errors detected
            Segmentation fault
            

            But not, interestingly enough, to another process:

            $ ./tests/Persistence_3 | cat
             
            *** No errors detected
            Running 1 test case...
            

            swinbank John Swinbank added a comment - Error is related to redirecting output to a file: [swinbank@tiger2-sumire daf_persistence ((w.2018.34))]$ ./tests/Persistence_3 Running 1 test case...   *** No errors detected   $ ./tests/Persistence_3 > log   *** No errors detected Segmentation fault But not, interestingly enough, to another process: $ ./tests/Persistence_3 | cat   *** No errors detected Running 1 test case...

            Error is coming from the guts of Boost:

            (gdb) set args > log
            (gdb) run
            Starting program: /scratch/swinbank/daf_persistence/tests/Persistence_3 > log
            Missing separate debuginfos, use: debuginfo-install glibc-2.17-222.el7.x86_64
            [Thread debugging using libthread_db enabled]
            Using host libthread_db library "/lib64/libthread_db.so.1".
             
            *** No errors detected
             
            Program received signal SIGSEGV, Segmentation fault.
            0x00002aaaaaff7cd5 in boost::serialization::typeid_system::extended_type_info_typeid_0::type_unregister() ()
               from /tigress/HSC/LSST/stack3_perseus_20171107/stack/miniconda3-4.3.21-10a4fa6/Linux64/boost/1.68/lib/libboost_serialization.so.1.68.0
            Missing separate debuginfos, use: debuginfo-install expat-2.1.0-10.el7_3.x86_64 libgcc-4.8.5-28.el7_5.1.x86_64 libgfortran-4.8.5-28.el7_5.1.x86_64 libicu-50.1.2-15.el7.x86_64 libquadmath-4.8.5-28.el7_5.1.x86_64 libstdc++-4.8.5-28.el7_5.1.x86_64 libuuid-2.23.2-52.el7.x86_64 nss-softokn-freebl-3.36.0-5.el7_5.x86_64 openblas-devel-0.2.20-6.sdl7.x86_64
            (gdb) bt
            #0  0x00002aaaaaff7cd5 in boost::serialization::typeid_system::extended_type_info_typeid_0::type_unregister() ()
               from /tigress/HSC/LSST/stack3_perseus_20171107/stack/miniconda3-4.3.21-10a4fa6/Linux64/boost/1.68/lib/libboost_serialization.so.1.68.0
            #1  0x0000000000410088 in boost::serialization::extended_type_info_typeid<lsst::daf::base::Persistable>::~extended_type_info_typeid (
                this=0x61bb80 <boost::serialization::singleton<boost::serialization::extended_type_info_typeid<lsst::daf::base::Persistable> >::get_instance()::t>, 
                __in_chrg=<optimized out>)
                at /tigress/HSC/LSST/stack3_perseus_20171107/stack/miniconda3-4.3.21-10a4fa6/Linux64/boost/1.68/include/boost/serialization/extended_type_info_typeid.hpp:96
            #2  boost::serialization::singleton<boost::serialization::extended_type_info_typeid<lsst::daf::base::Persistable> >::get_instance()::singleton_wrapper::~singleton_wrapper() (
                this=0x61bb80 <boost::serialization::singleton<boost::serialization::extended_type_info_typeid<lsst::daf::base::Persistable> >::get_instance()::t>, 
                __in_chrg=<optimized out>) at /tigress/HSC/LSST/stack3_perseus_20171107/stack/miniconda3-4.3.21-10a4fa6/Linux64/boost/1.68/include/boost/serialization/singleton.hpp:121
            #3  0x00002aaaad80cb69 in __run_exit_handlers () from /lib64/libc.so.6
            #4  0x00002aaaad80cbb7 in exit () from /lib64/libc.so.6
            #5  0x00002aaaad7f53dc in __libc_start_main () from /lib64/libc.so.6
            #6  0x0000000000409e3b in _start ()
            (gdb) 
            

            swinbank John Swinbank added a comment - Error is coming from the guts of Boost: (gdb) set args > log (gdb) run Starting program: /scratch/swinbank/daf_persistence/tests/Persistence_3 > log Missing separate debuginfos, use: debuginfo-install glibc-2.17-222.el7.x86_64 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1".   *** No errors detected   Program received signal SIGSEGV, Segmentation fault. 0x00002aaaaaff7cd5 in boost::serialization::typeid_system::extended_type_info_typeid_0::type_unregister() () from /tigress/HSC/LSST/stack3_perseus_20171107/stack/miniconda3-4.3.21-10a4fa6/Linux64/boost/1.68/lib/libboost_serialization.so.1.68.0 Missing separate debuginfos, use: debuginfo-install expat-2.1.0-10.el7_3.x86_64 libgcc-4.8.5-28.el7_5.1.x86_64 libgfortran-4.8.5-28.el7_5.1.x86_64 libicu-50.1.2-15.el7.x86_64 libquadmath-4.8.5-28.el7_5.1.x86_64 libstdc++-4.8.5-28.el7_5.1.x86_64 libuuid-2.23.2-52.el7.x86_64 nss-softokn-freebl-3.36.0-5.el7_5.x86_64 openblas-devel-0.2.20-6.sdl7.x86_64 (gdb) bt #0 0x00002aaaaaff7cd5 in boost::serialization::typeid_system::extended_type_info_typeid_0::type_unregister() () from /tigress/HSC/LSST/stack3_perseus_20171107/stack/miniconda3-4.3.21-10a4fa6/Linux64/boost/1.68/lib/libboost_serialization.so.1.68.0 #1 0x0000000000410088 in boost::serialization::extended_type_info_typeid<lsst::daf::base::Persistable>::~extended_type_info_typeid ( this=0x61bb80 <boost::serialization::singleton<boost::serialization::extended_type_info_typeid<lsst::daf::base::Persistable> >::get_instance()::t>, __in_chrg=<optimized out>) at /tigress/HSC/LSST/stack3_perseus_20171107/stack/miniconda3-4.3.21-10a4fa6/Linux64/boost/1.68/include/boost/serialization/extended_type_info_typeid.hpp:96 #2 boost::serialization::singleton<boost::serialization::extended_type_info_typeid<lsst::daf::base::Persistable> >::get_instance()::singleton_wrapper::~singleton_wrapper() ( this=0x61bb80 <boost::serialization::singleton<boost::serialization::extended_type_info_typeid<lsst::daf::base::Persistable> >::get_instance()::t>, __in_chrg=<optimized out>) at /tigress/HSC/LSST/stack3_perseus_20171107/stack/miniconda3-4.3.21-10a4fa6/Linux64/boost/1.68/include/boost/serialization/singleton.hpp:121 #3 0x00002aaaad80cb69 in __run_exit_handlers () from /lib64/libc.so.6 #4 0x00002aaaad80cbb7 in exit () from /lib64/libc.so.6 #5 0x00002aaaad7f53dc in __libc_start_main () from /lib64/libc.so.6 #6 0x0000000000409e3b in _start () (gdb)
            tjenness Tim Jenness added a comment -

            There are 7 C++ tests in daf_persistence but only 2 of them failed. PropertySet_2 and Persistence_3 are the only two tests that include boost/serialization/export.hpp. Persistence_1 and Persistence_2 use the same serialization code without including the header file and they work. I don't suppose valgrind is showing any oddities between Persistence_1 that works and Persistence_3 that fails?

            tjenness Tim Jenness added a comment - There are 7 C++ tests in daf_persistence but only 2 of them failed. PropertySet_2 and Persistence_3 are the only two tests that include boost/serialization/export.hpp . Persistence_1 and Persistence_2 use the same serialization code without including the header file and they work. I don't suppose valgrind is showing any oddities between Persistence_1 that works and Persistence_3 that fails?
            swinbank John Swinbank added a comment - - edited

            Valgrind was an interesting suggestion tjenness. On the Princeton machines, where I'm getting segfaults, tests/Persistence_3 generates a lot of chatter from Valgrind:

            $ valgrind tests/Persistence_3
            ==235923== Memcheck, a memory error detector
            ==235923== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
            ==235923== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
            ==235923== Command: tests/Persistence_3
            ==235923==
            Running 1 test case...
             
            *** No errors detected
            ==235923== Invalid read of size 8
            ==235923==    at 0x515ECD1: boost::serialization::typeid_system::extended_type_info_typeid_0::type_unregister() (in /tigress/HSC/LSST/stack3_perseus_20171107/stack/miniconda3-4.3.21-10a4fa6/Linux64/boost/1.68/lib/libboost_serialization.so.1.68.0)
            ==235923==    by 0x410087: boost::serialization::singleton<boost::serialization::extended_type_info_typeid<lsst::daf::base::Persistable> >::get_instance()::singleton_wrapper::~singleton_wrapper() (extended_type_info_typeid.hpp:96)
            ==235923==    by 0x7973B68: __run_exit_handlers (in /usr/lib64/libc-2.17.so)
            ==235923==    by 0x7973BB6: exit (in /usr/lib64/libc-2.17.so)
            ==235923==    by 0x795C3DB: (below main) (in /usr/lib64/libc-2.17.so)
            ==235923==  Address 0xaf959b0 is 32 bytes inside a block of size 40 free'd
            ==235923==    at 0x4C2B16D: operator delete(void*) (vg_replace_malloc.c:576)
            ==235923==    by 0x515F2D7: boost::serialization::singleton<std::multiset<boost::serialization::typeid_system::extended_type_info_typeid_0 const*, boost::serialization::typeid_system::type_compare, std::allocator<boost::serialization::typeid_system::extended_type_info_typeid_0 const*> > >::get_instance()::singleton_wrapper::~singleton_wrapper() (in /tigress/HSC/LSST/stack3_perseus_20171107/stack/miniconda3-4.3.21-10a4fa6/Linux64/boost/1.68/lib/libboost_serialization.so.1.68.0)
            ==235923==    by 0x7973ED9: __cxa_finalize (in /usr/lib64/libc-2.17.so)
            ==235923==    by 0x5157402: ??? (in /tigress/HSC/LSST/stack3_perseus_20171107/stack/miniconda3-4.3.21-10a4fa6/Linux64/boost/1.68/lib/libboost_serialization.so.1.68.0)
            ==235923==    by 0x4010189: _dl_fini (in /usr/lib64/ld-2.17.so)
            ==235923==    by 0x7973B68: __run_exit_handlers (in /usr/lib64/libc-2.17.so)
            ==235923==    by 0x7973BB6: exit (in /usr/lib64/libc-2.17.so)
            ==235923==    by 0x795C3DB: (below main) (in /usr/lib64/libc-2.17.so)
            ==235923==  Block was alloc'd at
            ==235923==    at 0x4C2A1E3: operator new(unsigned long) (vg_replace_malloc.c:334)
            ==235923==    by 0x515EBEF: boost::serialization::typeid_system::extended_type_info_typeid_0::type_register(std::type_info const&) (in /tigress/HSC/LSST/stack3_perseus_20171107/stack/miniconda3-4.3.21-10a4fa6/Linux64/boost/1.68/lib/libboost_serialization.so.1.68.0)
            ==235923==    by 0x412EBA: extended_type_info_typeid (extended_type_info_typeid.hpp:91)
            ==235923==    by 0x412EBA: singleton_wrapper (singleton.hpp:121)
            ==235923==    by 0x412EBA: boost::serialization::singleton<boost::serialization::extended_type_info_typeid<lsst::daf::base::Persistable> >::get_instance() (singleton.hpp:122)
            ==235923==    by 0x4EF24B0: get_const_instance (singleton.hpp:148)
            ==235923==    by 0x4EF24B0: void_caster_primitive (void_cast.hpp:183)
            ==235923==    by 0x4EF24B0: singleton_wrapper (singleton.hpp:121)
            ==235923==    by 0x4EF24B0: boost::serialization::singleton<boost::serialization::void_cast_detail::void_caster_primitive<lsst::daf::base::PropertySet, lsst::daf::base::Persistable> >::get_instance() (singleton.hpp:122)
            ==235923==    by 0x4EC7E4F: __static_initialization_and_destruction_0 (singleton.hpp:162)
            ==235923==    by 0x4EC7E4F: _GLOBAL__sub_I_PropertySetFormatter.cc (PropertySetFormatter.cc:161)
            ==235923==    by 0x400FAC2: _dl_init (in /usr/lib64/ld-2.17.so)
            ==235923==    by 0x4001029: ??? (in /usr/lib64/ld-2.17.so)
             
             [....]
             
            ==235923==
            ==235923== HEAP SUMMARY:
            ==235923==     in use at exit: 5,173 bytes in 26 blocks
            ==235923==   total heap usage: 5,028 allocs, 5,003 frees, 540,101 bytes allocated
            ==235923==
            ==235923== LEAK SUMMARY:
            ==235923==    definitely lost: 0 bytes in 0 blocks
            ==235923==    indirectly lost: 0 bytes in 0 blocks
            ==235923==      possibly lost: 0 bytes in 0 blocks
            ==235923==    still reachable: 5,173 bytes in 26 blocks
            ==235923==                       of which reachable via heuristic:
            ==235923==                         stdstring          : 168 bytes in 4 blocks
            ==235923==         suppressed: 0 bytes in 0 blocks
            ==235923== Rerun with --leak-check=full to see details of leaked memory
            ==235923==
            ==235923== For counts of detected and suppressed errors, rerun with: -v
            ==235923== ERROR SUMMARY: 12 errors from 12 contexts (suppressed: 0 from 0)
            

            (A bunch of output elided in the middle there to keep Jira happy; full output at https://gist.github.com/jdswinbank/805f9c392a6ba8290e962f8501d52aac).

            However, so does tests/Persistence_1: I've not compared in detail, but they both report “12 errors from 12 contexts” (although Persistence_1 doesn't segfault).

            Dropping back to w_2018_33, both come up Valgrind-clean.

            However, the plot thickens: I see exactly the same thing on lsst-dev01. That is, on lsst-dev01, Valgrind reports a bunch of errors in the daf_persistence test suite for w_2018_34, although w_2018_33 was clean. This looks to me like a regression in Boost, and we're just getting lucky that we're not seeing more segfaults on other systems.

            I note that there are no changes advertised to boost_serialization between 1.66 and 1.68 (according to the release notes), so this seems like either a Boost bug or (possibly) a problem caused by us using private APIs which have been changed without an announcement being made (I've no idea if that's something we're doing: I've no expertise in boost_serialization, and not enough time to dive in and check).

            swinbank John Swinbank added a comment - - edited Valgrind was an interesting suggestion tjenness . On the Princeton machines, where I'm getting segfaults, tests/Persistence_3 generates a lot of chatter from Valgrind: $ valgrind tests/Persistence_3 ==235923== Memcheck, a memory error detector ==235923== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==235923== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info ==235923== Command: tests/Persistence_3 ==235923== Running 1 test case...   *** No errors detected ==235923== Invalid read of size 8 ==235923== at 0x515ECD1: boost::serialization::typeid_system::extended_type_info_typeid_0::type_unregister() (in /tigress/HSC/LSST/stack3_perseus_20171107/stack/miniconda3-4.3.21-10a4fa6/Linux64/boost/1.68/lib/libboost_serialization.so.1.68.0) ==235923== by 0x410087: boost::serialization::singleton<boost::serialization::extended_type_info_typeid<lsst::daf::base::Persistable> >::get_instance()::singleton_wrapper::~singleton_wrapper() (extended_type_info_typeid.hpp:96) ==235923== by 0x7973B68: __run_exit_handlers (in /usr/lib64/libc-2.17.so) ==235923== by 0x7973BB6: exit (in /usr/lib64/libc-2.17.so) ==235923== by 0x795C3DB: (below main) (in /usr/lib64/libc-2.17.so) ==235923== Address 0xaf959b0 is 32 bytes inside a block of size 40 free'd ==235923== at 0x4C2B16D: operator delete(void*) (vg_replace_malloc.c:576) ==235923== by 0x515F2D7: boost::serialization::singleton<std::multiset<boost::serialization::typeid_system::extended_type_info_typeid_0 const*, boost::serialization::typeid_system::type_compare, std::allocator<boost::serialization::typeid_system::extended_type_info_typeid_0 const*> > >::get_instance()::singleton_wrapper::~singleton_wrapper() (in /tigress/HSC/LSST/stack3_perseus_20171107/stack/miniconda3-4.3.21-10a4fa6/Linux64/boost/1.68/lib/libboost_serialization.so.1.68.0) ==235923== by 0x7973ED9: __cxa_finalize (in /usr/lib64/libc-2.17.so) ==235923== by 0x5157402: ??? (in /tigress/HSC/LSST/stack3_perseus_20171107/stack/miniconda3-4.3.21-10a4fa6/Linux64/boost/1.68/lib/libboost_serialization.so.1.68.0) ==235923== by 0x4010189: _dl_fini (in /usr/lib64/ld-2.17.so) ==235923== by 0x7973B68: __run_exit_handlers (in /usr/lib64/libc-2.17.so) ==235923== by 0x7973BB6: exit (in /usr/lib64/libc-2.17.so) ==235923== by 0x795C3DB: (below main) (in /usr/lib64/libc-2.17.so) ==235923== Block was alloc'd at ==235923== at 0x4C2A1E3: operator new(unsigned long) (vg_replace_malloc.c:334) ==235923== by 0x515EBEF: boost::serialization::typeid_system::extended_type_info_typeid_0::type_register(std::type_info const&) (in /tigress/HSC/LSST/stack3_perseus_20171107/stack/miniconda3-4.3.21-10a4fa6/Linux64/boost/1.68/lib/libboost_serialization.so.1.68.0) ==235923== by 0x412EBA: extended_type_info_typeid (extended_type_info_typeid.hpp:91) ==235923== by 0x412EBA: singleton_wrapper (singleton.hpp:121) ==235923== by 0x412EBA: boost::serialization::singleton<boost::serialization::extended_type_info_typeid<lsst::daf::base::Persistable> >::get_instance() (singleton.hpp:122) ==235923== by 0x4EF24B0: get_const_instance (singleton.hpp:148) ==235923== by 0x4EF24B0: void_caster_primitive (void_cast.hpp:183) ==235923== by 0x4EF24B0: singleton_wrapper (singleton.hpp:121) ==235923== by 0x4EF24B0: boost::serialization::singleton<boost::serialization::void_cast_detail::void_caster_primitive<lsst::daf::base::PropertySet, lsst::daf::base::Persistable> >::get_instance() (singleton.hpp:122) ==235923== by 0x4EC7E4F: __static_initialization_and_destruction_0 (singleton.hpp:162) ==235923== by 0x4EC7E4F: _GLOBAL__sub_I_PropertySetFormatter.cc (PropertySetFormatter.cc:161) ==235923== by 0x400FAC2: _dl_init (in /usr/lib64/ld-2.17.so) ==235923== by 0x4001029: ??? (in /usr/lib64/ld-2.17.so)   [....]   ==235923== ==235923== HEAP SUMMARY: ==235923== in use at exit: 5,173 bytes in 26 blocks ==235923== total heap usage: 5,028 allocs, 5,003 frees, 540,101 bytes allocated ==235923== ==235923== LEAK SUMMARY: ==235923== definitely lost: 0 bytes in 0 blocks ==235923== indirectly lost: 0 bytes in 0 blocks ==235923== possibly lost: 0 bytes in 0 blocks ==235923== still reachable: 5,173 bytes in 26 blocks ==235923== of which reachable via heuristic: ==235923== stdstring : 168 bytes in 4 blocks ==235923== suppressed: 0 bytes in 0 blocks ==235923== Rerun with --leak-check=full to see details of leaked memory ==235923== ==235923== For counts of detected and suppressed errors, rerun with: -v ==235923== ERROR SUMMARY: 12 errors from 12 contexts (suppressed: 0 from 0) (A bunch of output elided in the middle there to keep Jira happy; full output at https://gist.github.com/jdswinbank/805f9c392a6ba8290e962f8501d52aac ). However, so does tests/Persistence_1 : I've not compared in detail, but they both report “12 errors from 12 contexts” (although Persistence_1 doesn't segfault). Dropping back to w_2018_33 , both come up Valgrind-clean. However, the plot thickens: I see exactly the same thing on lsst-dev01 . That is, on lsst-dev01 , Valgrind reports a bunch of errors in the daf_persistence test suite for w_2018_34 , although w_2018_33 was clean. This looks to me like a regression in Boost, and we're just getting lucky that we're not seeing more segfaults on other systems. I note that there are no changes advertised to boost_serialization between 1.66 and 1.68 (according to the release notes), so this seems like either a Boost bug or (possibly) a problem caused by us using private APIs which have been changed without an announcement being made (I've no idea if that's something we're doing: I've no expertise in boost_serialization, and not enough time to dive in and check).

            As jbosch points out OOB, the failing code should be removed as part of the fallout from RFC-482. Given that, I propose to take no more action here, but will block this ticket on DM-14504 and hope that gets done soon (it won't be before w_2018_36, but if we're lucky it might be in that weekly).

            swinbank John Swinbank added a comment - As jbosch points out OOB, the failing code should be removed as part of the fallout from RFC-482 . Given that, I propose to take no more action here, but will block this ticket on DM-14504 and hope that gets done soon (it won't be before w_2018_36 , but if we're lucky it might be in that weekly).
            tjenness Tim Jenness added a comment -

            tribeiro seems to be having the same problem but on CentOS7 with GCC7 compilers. swinbank what OS is Tiger running?

            tjenness Tim Jenness added a comment - tribeiro seems to be having the same problem but on CentOS7 with GCC7 compilers. swinbank what OS is Tiger running?
            tjenness Tim Jenness added a comment -

            tribeiro reports that it built fine for him with the devtoolset-6 compilers.

            tjenness Tim Jenness added a comment - tribeiro reports that it built fine for him with the devtoolset-6 compilers.

            [swinbank@tiger2-sumire ~]$ lsb_release -a
            LSB Version:    :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
            Distributor ID: Springdale
            Description:    Springdale Linux release 7.5 (Verona)
            Release:        7.5
            Codename:       Verona
            

            Springdale? Thanks Princeton. I believe that Springdale 7.5 is effectively a recompilation from scratch of the RHEL 7.5 source — it should be equivalent to, but not the same as, CentOS 7.5.

            Even on systems where this doesn't happen to segfault, Valgrind is still showing problems; switching toolchains might buy some temporary respite, but this code is still rotten and could spontaneously explode when we're not looking.

            swinbank John Swinbank added a comment - [swinbank@tiger2-sumire ~]$ lsb_release -a LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch Distributor ID: Springdale Description: Springdale Linux release 7.5 (Verona) Release: 7.5 Codename: Verona Springdale? Thanks Princeton. I believe that Springdale 7.5 is effectively a recompilation from scratch of the RHEL 7.5 source — it should be equivalent to, but not the same as, CentOS 7.5. Even on systems where this doesn't happen to segfault, Valgrind is still showing problems; switching toolchains might buy some temporary respite, but this code is still rotten and could spontaneously explode when we're not looking.
            jbosch Jim Bosch added a comment -

            I believe all of the failing tests were removed on DM-15767.

            jbosch Jim Bosch added a comment - I believe all of the failing tests were removed on DM-15767 .
            tjenness Tim Jenness added a comment -

            Looks like they are gone. Should we mark as INVALID?

            tjenness Tim Jenness added a comment - Looks like they are gone. Should we mark as INVALID?

            Let's check and confirm that we are now a) not segfaulting and b) Valgrind clean, then we can get rid of this ticket.

            swinbank John Swinbank added a comment - Let's check and confirm that we are now a) not segfaulting and b) Valgrind clean, then we can get rid of this ticket.

            Confirmed that w_2018_42 is now up & running on Tiger2. Of course, since the code no longer exists I can't actually check that it's Valgrind clean, but I think we can regard this as done.

            swinbank John Swinbank added a comment - Confirmed that w_2018_42 is now up & running on Tiger2. Of course, since the code no longer exists I can't actually check that it's Valgrind clean, but I think we can regard this as done.

            People

              swinbank John Swinbank
              swinbank John Swinbank
              Jim Bosch, John Swinbank, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Jenkins

                  No builds found.