Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-27883

obs_lsst has a race condition between tests and curated calibration ingestion

    XMLWordPrintable

Details

    • Story
    • Status: Done
    • Resolution: Done
    • None
    • obs_lsst
    • None
    • 1
    • Architecture
    • No

    Description

      pytest as run by the tests SConscript attempts to scan all files, but that set can be in flux as the per-camera ingestCuratedCalibs.py executions can be running at the same time. There are no dependencies between the two.

      I'm not sure if the correct solution is to sequence one before the other or to exclude those directories from pytest.

      Attachments

        Activity

          czw Are these your tests?  Thoughts?

           

          rhl Robert Lupton added a comment - czw Are these your tests?  Thoughts?  

          I don't believe so.  My only thought is that if new calibrations were added, that might have increased the execution time long enough to start this being a problem, but I don't recall anything new being added in the past month.

          czw Christopher Waters added a comment - I don't believe so.  My only thought is that if new calibrations were added, that might have increased the execution time long enough to start this being a problem, but I don't recall anything new being added in the past month.
          ktl Kian-Tat Lim added a comment - - edited

          ingestCuratedCalibs.py is part of the build (SConscripts in the camera directories), not the tests.

          ktl Kian-Tat Lim added a comment - - edited ingestCuratedCalibs.py is part of the build (SConscripts in the camera directories), not the tests.
          tjenness Tim Jenness added a comment -

          This comment:

          # Note the ordering here is critical. LATISS is put at the end here to ensure
          # that the tests are run first and version.py is created, because creation of
          # of the defect registry required the camera to be instantiated.
          # If other cameras add defect generation they should add their build to
          # the end of this list, along with LATISS
          

          in the SConstruct file implies that these targets explicitly run after tests (they do also hard code a dependency on the python target) suggesting that the answer is to change the scanning code. Can someone point to an actual error report from this problem? Was there a Jenkins failure I can look at?

          tjenness Tim Jenness added a comment - This comment: # Note the ordering here is critical. LATISS is put at the end here to ensure # that the tests are run first and version.py is created, because creation of # of the defect registry required the camera to be instantiated. # If other cameras add defect generation they should add their build to # the end of this list, along with LATISS in the SConstruct file implies that these targets explicitly run after tests (they do also hard code a dependency on the python target) suggesting that the answer is to change the scanning code. Can someone point to an actual error report from this problem? Was there a Jenkins failure I can look at?

          https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix-test/detail/stack-os-matrix-test/15/tests is an example of a failure.

          My understanding from a quick glance at sconsUtils (https://github.com/lsst/sconsUtils/blob/master/python/lsst/sconsUtils/scripts.py#L208-L209) is that there is in fact no implied dependency ordering from the target list alone; any dependencies must be added separately.

          ktl Kian-Tat Lim added a comment - https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix-test/detail/stack-os-matrix-test/15/tests is an example of a failure. My understanding from a quick glance at sconsUtils ( https://github.com/lsst/sconsUtils/blob/master/python/lsst/sconsUtils/scripts.py#L208-L209 ) is that there is in fact no implied dependency ordering from the target list alone; any dependencies must be added separately.
          tjenness Tim Jenness added a comment -

          In the end I added the directories to the ignore list when running pytest.

          There is also an extra unrelated fix to header stuff that I saw whilst testing this but I can remove that if needed. It's a gen2-specific problem where the curated calibs ingest calls normal raw data ingest which now calls fix_header and that causes and extra log message because curated calibrations say they are LATISS but they aren't raw latiss and header fixup gets confused.

          tjenness Tim Jenness added a comment - In the end I added the directories to the ignore list when running pytest. There is also an extra unrelated fix to header stuff that I saw whilst testing this but I can remove that if needed. It's a gen2-specific problem where the curated calibs ingest calls normal raw data ingest which now calls fix_header and that causes and extra log message because curated calibrations say they are LATISS but they aren't raw latiss and header fixup gets confused.

          Looks fine. Thanks for dealing with this.

          ktl Kian-Tat Lim added a comment - Looks fine. Thanks for dealing with this.

          People

            tjenness Tim Jenness
            ktl Kian-Tat Lim
            Kian-Tat Lim
            Christopher Waters, Kian-Tat Lim, Robert Lupton, Tim Jenness
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Jenkins

                No builds found.