Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-15593

daf_persistence segfaults on Princeton tiger2 cluster

    XMLWordPrintable

Details

    • Story
    • Status: Done
    • Resolution: Done
    • None
    • None
    • None

    Description

      Building daf_persistence on tiger2-sumire fails as follows:

      $ exec scl enable devtoolset-6 bash                                                                                                          
      $ bash newinstall.sh
       
        [elided]
       
      $ . loadLSST.bash 
      $ eups distrib install -t w_2018_34 daf_persistence
       
         [elided]
       
        [ 34/34 ]  daf_persistence 16.0-3-g3806c63+6 ... 
       
      ***** error: from /scratch/swinbank/stack_master/stack/miniconda3-4.5.4-fcd27eb/EupsBuildDir/Linux64/daf_persistence-16.0-3-g3806c63+6/build.log:
      Coverage.py warning: No data was collected. (no-data-collected)
      Global pytest run completed successfully
      Failed test output:
      tests/Persistence_3
       
      Running 1 test case...
       
      *** No errors detected
      tests/PropertySet_2
       
      Running 1 test case...
       
      *** No errors detected
      The following tests failed:
      /scratch/swinbank/stack_master/stack/miniconda3-4.5.4-fcd27eb/EupsBuildDir/Linux64/daf_persistence-16.0-3-g3806c63+6/daf_persistence-16.0-3-g3806c63+6/tests/.tests/Persistence_3.
      failed
      /scratch/swinbank/stack_master/stack/miniconda3-4.5.4-fcd27eb/EupsBuildDir/Linux64/daf_persistence-16.0-3-g3806c63+6/daf_persistence-16.0-3-g3806c63+6/tests/.tests/PropertySet_2.
      failed
      2 tests failed
      scons: *** [checkTestStatus] Error 1
      scons: building terminated because of errors.
      + exit -4
      eups distrib: Failed to build daf_persistence-16.0-3-g3806c63+6.eupspkg: Command:
              source "/scratch/swinbank/stack_master/eups/2.1.4/bin/setups.sh"; export EUPS_PATH="/scratch/swinbank/stack_master/stack/miniconda3-4.5.4-fcd27eb"; (/scratch/swinbank/sta
      ck_master/stack/miniconda3-4.5.4-fcd27eb/EupsBuildDir/Linux64/daf_persistence-16.0-3-g3806c63+6/build.sh) >> /scratch/swinbank/stack_master/stack/miniconda3-4.5.4-fcd27eb/EupsBui
      ldDir/Linux64/daf_persistence-16.0-3-g3806c63+6/build.log 2>&1 4>/scratch/swinbank/stack_master/stack/miniconda3-4.5.4-fcd27eb/EupsBuildDir/Linux64/daf_persistence-16.0-3-g3806c6
      3+6/build.msg 
      exited with code 252
      

      This means that the shared stack on Tiger is not currently being updated.

      Attachments

        Issue Links

          Activity

            [swinbank@tiger2-sumire ~]$ lsb_release -a
            LSB Version:    :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
            Distributor ID: Springdale
            Description:    Springdale Linux release 7.5 (Verona)
            Release:        7.5
            Codename:       Verona
            

            Springdale? Thanks Princeton. I believe that Springdale 7.5 is effectively a recompilation from scratch of the RHEL 7.5 source — it should be equivalent to, but not the same as, CentOS 7.5.

            Even on systems where this doesn't happen to segfault, Valgrind is still showing problems; switching toolchains might buy some temporary respite, but this code is still rotten and could spontaneously explode when we're not looking.

            swinbank John Swinbank added a comment - [swinbank@tiger2-sumire ~]$ lsb_release -a LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch Distributor ID: Springdale Description: Springdale Linux release 7.5 (Verona) Release: 7.5 Codename: Verona Springdale? Thanks Princeton. I believe that Springdale 7.5 is effectively a recompilation from scratch of the RHEL 7.5 source — it should be equivalent to, but not the same as, CentOS 7.5. Even on systems where this doesn't happen to segfault, Valgrind is still showing problems; switching toolchains might buy some temporary respite, but this code is still rotten and could spontaneously explode when we're not looking.
            jbosch Jim Bosch added a comment -

            I believe all of the failing tests were removed on DM-15767.

            jbosch Jim Bosch added a comment - I believe all of the failing tests were removed on DM-15767 .
            tjenness Tim Jenness added a comment -

            Looks like they are gone. Should we mark as INVALID?

            tjenness Tim Jenness added a comment - Looks like they are gone. Should we mark as INVALID?

            Let's check and confirm that we are now a) not segfaulting and b) Valgrind clean, then we can get rid of this ticket.

            swinbank John Swinbank added a comment - Let's check and confirm that we are now a) not segfaulting and b) Valgrind clean, then we can get rid of this ticket.

            Confirmed that w_2018_42 is now up & running on Tiger2. Of course, since the code no longer exists I can't actually check that it's Valgrind clean, but I think we can regard this as done.

            swinbank John Swinbank added a comment - Confirmed that w_2018_42 is now up & running on Tiger2. Of course, since the code no longer exists I can't actually check that it's Valgrind clean, but I think we can regard this as done.

            People

              swinbank John Swinbank
              swinbank John Swinbank
              Jim Bosch, John Swinbank, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Jenkins

                  No builds found.