Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-11693

Rebuild runs pytest tests twice for every product

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: lsstsw, sconsUtils
    • Labels:
      None
    • Team:
      Architecture

      Description

      I have seen a case where obs_sdss is built with rebuild and seems to pass, but where pytest fails and writes a .failed file. This has been seen on Jenkins and on my laptop. Running scons from the command line behaves as expected.

      Examining the log file on my laptop, the odd thing is that pytest runs twice:

      [2017-08-29T19:39:02.373155Z] running global pytest... 
      [2017-08-29T19:39:02.640250Z] Could not rewrite shebang of bin.src/genCameraRegistry.py. Please check file or move it to bin directory.
      [2017-08-29T19:39:04.775009Z] ============================= test session starts ==============================
      [2017-08-29T19:39:04.775102Z] platform darwin -- Python 3.6.1, pytest-3.2.0, py-1.4.34, pluggy-0.4.0
      [2017-08-29T19:39:04.775124Z] rootdir: /Users/timj/work/lsstsw3/build/obs_sdss, inifile:
      [2017-08-29T19:39:04.775146Z] plugins: session2file-0.1.9, xdist-1.19.2.dev0+g459d52e.d20170815, forked-0.3.dev0+g1dd93f6.d20170815, flake8-0.8.1
      [2017-08-29T19:39:04.786264Z] gw0 I / gw1 I / gw2 I / gw3 I
      [2017-08-29T19:39:34.090438Z] gw0 [28] / gw1 [28] / gw2 [28] / gw3 [28]
      [2017-08-29T19:39:34.090629Z] 
      [2017-08-29T19:39:34.090656Z] scheduling tests via LoadScheduling
      [2017-08-29T19:39:54.480372Z] ............................
      [2017-08-29T19:39:54.480545Z]  generated xml file: /Users/timj/work/lsstsw3/build/obs_sdss/tests/.tests/pytest-obs_sdss.xml 
      [2017-08-29T19:39:54.480645Z] =============================== warnings summary ===============================
      [2017-08-29T19:39:54.480672Z] tests/testSdss.py::SdssMapperTestCase::testGetDR7
      [2017-08-29T19:39:54.480719Z]   /Users/timj/work/lsstsw3/stack/DarwinX86/obs_base/13.0-43-g43302d7/python/lsst/obs/base/cameraMapper.py:247: ResourceWarning: unclos
      ed file <_io.TextIOWrapper name='/Users/timj/work/lsstsw3/build/obs_sdss/tests/data/dr7/runs/registry.sqlite3' mode='r' encoding='US-ASCII'>
      [2017-08-29T19:39:54.480959Z]     posixIfNoSql=(not parentRegistry))
      [2017-08-29T19:39:54.480989Z]   /Users/timj/work/lsstsw3/build/obs_sdss/python/lsst/obs/sdss/sdssMapper.py:116: ResourceWarning: unclosed file <_io.FileIO name='/Us
      ers/timj/work/lsstsw3/build/obs_sdss/tests/data/dr7/runs/5754/40/objcs/3/fpM-005754-r3-0280.fit' mode='rb' closefd=True>
      [2017-08-29T19:39:54.481292Z]     return convertfpM(location.getLocationsWithRoot()[0])
      [2017-08-29T19:39:54.481309Z]   /Users/timj/work/lsstsw3/build/obs_sdss/python/lsst/obs/sdss/sdssMapper.py:123: ResourceWarning: unclosed file <_io.FileIO name='/Us
      ers/timj/work/lsstsw3/build/obs_sdss/tests/data/dr7/runs/5754/40/astrom/asTrans-005754.fit' mode='rb' closefd=True>
      [2017-08-29T19:39:54.481319Z]     dataId['camcol'], dataId['field'])
      [2017-08-29T19:39:54.481327Z] 
      [2017-08-29T19:39:54.481335Z] -- Docs: http://doc.pytest.org/en/latest/warnings.html
      [2017-08-29T19:39:54.481344Z] ==================== 28 passed, 3 warnings in 49.71 seconds ====================
      [2017-08-29T19:39:54.561410Z] Global pytest run completed successfully
      [2017-08-29T19:39:54.617564Z] scons: done building targets.
      [2017-08-29T19:39:54.921313Z] + '[' -d tests/.tests ']'
      [2017-08-29T19:39:54.927900Z] ++ ls 'tests/.tests/*.failed'
      [2017-08-29T19:39:54.928767Z] ++ wc -l
      [2017-08-29T19:39:54.935286Z] + '[' '       0' -ne 0 ']'
      [2017-08-29T19:39:54.935337Z] + eupspkg PRODUCT=obs_sdss VERSION=13.0-8-g507a745+14 FLAVOR=generic install
      [2017-08-29T19:40:02.673446Z] scons: Reading SConscript files ...
      [2017-08-29T19:40:04.035258Z] Setting OS X binary compatibility level: 10.9
      [2017-08-29T19:40:04.035535Z] EUPS integration: enabled
      [2017-08-29T19:40:04.263444Z] Checking who built the CC compiler...(cached) clang=8.1.0
      [2017-08-29T19:40:04.263525Z] CC is clang version 8.1.0
      [2017-08-29T19:40:04.264253Z] Checking for C++11 support
      [2017-08-29T19:40:04.283121Z] Checking whether the C++ compiler works... (cached) yes
      [2017-08-29T19:40:04.283214Z] C++11 supported with '-std=c++11'
      [2017-08-29T19:40:04.392950Z] Setting up environment to build package 'obs_sdss'.
      [2017-08-29T19:40:06.117872Z] Running pytest with 4 processes
      [2017-08-29T19:40:06.117962Z] pytest: running on 4 Python test files.
      [2017-08-29T19:40:06.155333Z] scons: done reading SConscript files.
      [2017-08-29T19:40:06.156385Z] scons: Building targets ...
      [2017-08-29T19:40:06.268000Z] running global pytest... 
      [2017-08-29T19:40:07.973313Z] ============================= test session starts ==============================
      [2017-08-29T19:40:07.973488Z] platform darwin -- Python 3.6.1, pytest-3.2.0, py-1.4.34, pluggy-0.4.0
      [2017-08-29T19:40:07.974415Z] rootdir: /Users/timj/work/lsstsw3/build/obs_sdss, inifile:
      [2017-08-29T19:40:07.974455Z] plugins: session2file-0.1.9, xdist-1.19.2.dev0+g459d52e.d20170815, forked-0.3.dev0+g1dd93f6.d20170815, flake8-0.8.1
      [2017-08-29T19:40:07.989820Z] gw0 I / gw1 I / gw2 I / gw3 I
      [2017-08-29T19:40:35.949351Z] gw0 [28] / gw1 [28] / gw2 [28] / gw3 [28]
      [2017-08-29T19:40:35.949741Z] 
      [2017-08-29T19:40:35.949789Z] scheduling tests via LoadScheduling
      [2017-08-29T19:40:57.933160Z] ........F...................
      [2017-08-29T19:40:57.933300Z]  generated xml file: /Users/timj/work/lsstsw3/build/obs_sdss/tests/.tests/pytest-obs_sdss.xml 
      [2017-08-29T19:40:57.933318Z] =================================== FAILURES ===================================
      [2017-08-29T19:40:57.933326Z] _____________________________ GetIdTestCase.testId _____________________________
      [2017-08-29T19:40:57.933334Z] [gw0] darwin -- Python 3.6.1 /Users/timj/work/lsstsw3/miniconda/bin/python
      [2017-08-29T19:40:57.933341Z] 
      [2017-08-29T19:40:57.933348Z] self = <testGetId.GetIdTestCase testMethod=testId>
      [2017-08-29T19:40:57.933355Z] 
      [2017-08-29T19:40:57.933367Z]     def setUp(self):
      [2017-08-29T19:40:57.933375Z]         self.bf = dafPersist.ButlerFactory(mapper=SdssMapper(root="."))
      [2017-08-29T19:40:57.933381Z] >       self.butler = self.bf.create()
      [2017-08-29T19:40:57.933388Z] 
      [2017-08-29T19:40:57.933394Z] tests/testGetId.py:37: 
      [2017-08-29T19:40:57.933400Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
      [2017-08-29T19:40:57.933406Z] ../../stack/DarwinX86/daf_persistence/13.0-32-g7b14ddd/python/lsst/daf/persistence/butlerFactory.py:79: in create
      [2017-08-29T19:40:57.933413Z]     return Butler(root=root, mapper=self.mapper)
      [2017-08-29T19:40:57.933419Z] ../../stack/DarwinX86/daf_persistence/13.0-32-g7b14ddd/python/lsst/daf/persistence/butler.py:521: in __init__
      [2017-08-29T19:40:57.933425Z]     self._getCfgs(repoDataList)
      [2017-08-29T19:40:57.933436Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
      [2017-08-29T19:40:57.933443Z] 
      [2017-08-29T19:40:57.936494Z] ../../stack/DarwinX86/daf_persistence/13.0-32-g7b14ddd/python/lsst/daf/persistence/butler.py:799: RuntimeError
      [2017-08-29T19:40:57.936500Z] ----------------------------- Captured stdout call -----------------------------
      [2017-08-29T19:40:57.936505Z] CameraMapper INFO: Loading Posix exposure registry from .
      [2017-08-29T19:40:57.936512Z] daf.persistence.butler WARN: Passing an instantiated mapper into Butler.__init__ will prevent Butler from passing parentRegistry or repositoryCfg information to the mapper, which is done only at init time. It is better to pass a importable string or class object.
      [2017-08-29T19:40:57.936520Z] daf.persistence.butler WARN: Passing an instantiated mapper into Butler.__init__ will prevent Butler from passing parentRegistry or repositoryCfg information to the mapper, which is done only at init time. It is better to pass a importable string or class object.
      [2017-08-29T19:40:57.936525Z] CameraMapper INFO: Loading Posix exposure registry from .
      [2017-08-29T19:40:57.936530Z] =============================== warnings summary ===============================
      [2017-08-29T19:40:57.936536Z] tests/testSdss.py::SdssMapperTestCase::testGetDR7
      [2017-08-29T19:40:57.936541Z]   /Users/timj/work/lsstsw3/stack/DarwinX86/obs_base/13.0-43-g43302d7/python/lsst/obs/base/cameraMapper.py:247: ResourceWarning: unclosed file <_io.TextIOWrapper name='/Users/timj/work/lsstsw3/build/obs_sdss/tests/data/dr7/runs/registry.sqlite3' mode='r' encoding='US-ASCII'>
      [2017-08-29T19:40:57.936547Z]     posixIfNoSql=(not parentRegistry))
      [2017-08-29T19:40:57.936553Z]   /Users/timj/work/lsstsw3/build/obs_sdss/python/lsst/obs/sdss/sdssMapper.py:116: ResourceWarning: unclosed file <_io.FileIO name='/Users/timj/work/lsstsw3/build/obs_sdss/tests/data/dr7/runs/5754/40/objcs/3/fpM-005754-r3-0280.fit' mode='rb' closefd=True>
      [2017-08-29T19:40:57.936559Z]     return convertfpM(location.getLocationsWithRoot()[0])
      [2017-08-29T19:40:57.936564Z]   /Users/timj/work/lsstsw3/build/obs_sdss/python/lsst/obs/sdss/sdssMapper.py:123: ResourceWarning: unclosed file <_io.FileIO name='/Users/timj/work/lsstsw3/build/obs_sdss/tests/data/dr7/runs/5754/40/astrom/asTrans-005754.fit' mode='rb' closefd=True>
      [2017-08-29T19:40:57.936569Z]     dataId['camcol'], dataId['field'])
      [2017-08-29T19:40:57.936574Z] 
      [2017-08-29T19:40:57.936579Z] -- Docs: http://doc.pytest.org/en/latest/warnings.html
      [2017-08-29T19:40:57.936584Z] =============== 1 failed, 27 passed, 3 warnings in 49.96 seconds ===============
      [2017-08-29T19:40:58.027508Z] Global pytest run: failed
      [2017-08-29T19:40:58.056627Z] Install file: "ups/obs_sdss.build" as "/Users/timj/work/lsstsw3/stack/DarwinX86/obs_sdss/13.0-8-g507a745+14/ups/obs_sdss.build"
      [2017-08-29T19:40:58.058685Z] Install file: "ups/obs_sdss.cfg" as "/Users/timj/work/lsstsw3/stack/DarwinX86/obs_sdss/13.0-8-g507a745+14/ups/obs_sdss.cfg"
      

      I think exit status for tests depends on the scan for .failed files so I am wondering if the initial pass causes that scan to trigger, allowing install to work and the later test happens after that scan for .failed.

      The actual bug in obs_sdss is being handled in DM-11691. DM-11692 will be useful as it was hard to understand the Jenkins behavior without access to the _build.log file.

        Attachments

          Issue Links

            Activity

            Hide
            tjenness Tim Jenness added a comment -

            The reason there are two entries in the log is that rebuild uses eupspkg and eupspkg runs the build phase, scons and then the install phase scons install. When I implemented DM-11514 I handed off test re-running to pytest, telling it only to re-run failed tests, but scons will always run pytest because, otherwise the re-run mode is meaningless. This is fine when you are developing and typing scons a lot but not very efficient when you spend 2 minutes running the tests in afw and then spend another two minutes running those same tests when you run the install.

            The obvious next question is why does obs_sdss fail on the second run through (resulting in different warnings): this is caused by the test not clearing up after itself.

            Show
            tjenness Tim Jenness added a comment - The reason there are two entries in the log is that rebuild uses eupspkg and eupspkg runs the build phase, scons and then the install phase scons install . When I implemented DM-11514 I handed off test re-running to pytest , telling it only to re-run failed tests, but scons will always run pytest because, otherwise the re-run mode is meaningless. This is fine when you are developing and typing scons a lot but not very efficient when you spend 2 minutes running the tests in afw and then spend another two minutes running those same tests when you run the install . The obvious next question is why does obs_sdss fail on the second run through (resulting in different warnings): this is caused by the test not clearing up after itself.
            Hide
            tjenness Tim Jenness added a comment -
            Show
            tjenness Tim Jenness added a comment - Another example: https://ci.lsst.codes/job/stack-os-matrix/27196/
            Hide
            womullan Wil O'Mullane added a comment -

            Brian Van Klaveren  would you like to take a look at this ?

            Show
            womullan Wil O'Mullane added a comment - Brian Van Klaveren   would you like to take a look at this ?
            Hide
            bvan Brian Van Klaveren added a comment - - edited

            assigned to me, I won't get to it for a few weeks.

            Show
            bvan Brian Van Klaveren added a comment - - edited assigned to me, I won't get to it for a few weeks.
            Hide
            tjenness Tim Jenness added a comment -

            The other point about always running pytest is that scons no longer knows which tests are going to be run because the pytest plugins can change things. In particular, flake8 testing depends on all python files being tested if they change.

            Show
            tjenness Tim Jenness added a comment - The other point about always running pytest is that scons no longer knows which tests are going to be run because the pytest plugins can change things. In particular, flake8 testing depends on all python files being tested if they change.
            Hide
            bvan Brian Van Klaveren added a comment -

            The problem is that pytest writes out nothing to the pytest cache if all the tests succeed. So --lf doesn't actually buy you anything if the tests succeeded the first case. Since there's not pytest cache

            So, In the case where pytest actually succeeds, --lf just means you run all the tests all over again.

            A fix might be write out a timestamp to a {{tests/.tests/lastSuccess }}file and check that with scons install

            Show
            bvan Brian Van Klaveren added a comment - The problem is that pytest writes out nothing to the pytest cache if all the tests succeed. So  --lf doesn't actually buy you anything if the tests succeeded the first case. Since there's not pytest cache So, In the case where pytest actually succeeds, --lf just means you run all the tests all over again. A fix might be write out a timestamp to a {{tests/.tests/lastSuccess }}file and check that with scons install
            Show
            ktl Kian-Tat Lim added a comment - https://docs.pytest.org/en/latest/cache.html#behavior-when-no-tests-failed-in-the-last-run ?
            Hide
            ktl Kian-Tat Lim added a comment - - edited

            (Added in 3.5.0, I think.  Yes: https://github.com/pytest-dev/pytest/issues/3139)

            Show
            ktl Kian-Tat Lim added a comment - - edited (Added in 3.5.0, I think.  Yes: https://github.com/pytest-dev/pytest/issues/3139 )
            Hide
            bvan Brian Van Klaveren added a comment - - edited

            I'm still not sure if that's what we want. I think it would work okay in the context of lsst_build though. The problem is pytest still can't distinguish between tests never running and tests that have been run but passed.

            e.g. I tested the following:

             

            I think there's a middle ground where we do something like supply --lfnf=${LFNF_OPT} and only set LFNF_OPT to "none" if we found a session file, otherwise "all".

            For example:

            pytest -Wd --lf --lfnf=${LFNF_OPT} --junit-xml=foo.xml --session2file=foo.out
            

            Show
            bvan Brian Van Klaveren added a comment - - edited I'm still not sure if that's what we want. I think it would work okay in the context of lsst_build though. The problem is pytest still can't distinguish between tests never running and tests that have been run but passed. e.g. I tested the following:   I think there's a middle ground where we do something like supply  --lfnf=${LFNF_OPT } and only set LFNF_OPT to "none" if we found a session file, otherwise "all". For example: pytest -Wd --lf --lfnf=${LFNF_OPT} --junit-xml=foo.xml --session2file=foo.out
            Hide
            tjenness Tim Jenness added a comment -

            This looks like the best approach for now. I'm happy for it to be merged.

            Show
            tjenness Tim Jenness added a comment - This looks like the best approach for now. I'm happy for it to be merged.
            Hide
            rowen Russell Owen added a comment -

            Why do you ignore N807 on standard "dunder" methods in package afw on this ticket branch? I've not seen such methods trigger an error in flake8 before and they certainly should not. What version of flake8 do you have where you see the failure?

            Show
            rowen Russell Owen added a comment - Why do you ignore N807 on standard "dunder" methods in package afw on this ticket branch? I've not seen such methods trigger an error in flake8 before and they certainly should not. What version of flake8 do you have where you see the failure?
            Hide
            tjenness Tim Jenness added a comment -

            PEP8 naming has been updated (we needed a fix for metaclasses). N807 triggers because these dunder methods are not inside a class but are free functions.

            Show
            tjenness Tim Jenness added a comment - PEP8 naming has been updated (we needed a fix for metaclasses). N807 triggers because these dunder methods are not inside a class but are free functions.
            Hide
            rowen Russell Owen added a comment - - edited

            I see. Old wrappers. I suggest it would be better to modernize the wrapper by using @continueClass from lsst.utils (which admittedly requires its own noqa: @continueClass # noqa F811

            Show
            rowen Russell Owen added a comment - - edited I see. Old wrappers. I suggest it would be better to modernize the wrapper by using @continueClass from lsst.utils (which admittedly requires its own noqa: @continueClass # noqa F811
            Hide
            bvan Brian Van Klaveren added a comment -

            Build passed with Jenkins job #28667

            Show
            bvan Brian Van Klaveren added a comment - Build passed with Jenkins job #28667
            Hide
            rowen Russell Owen added a comment - - edited

            I am sure you have done this (and am being lazy about looking at the pull request again) but please be careful of outdated code. I think I modernized all cases where "dunder" free functions were used in afw (eliminating those functions and thus the need for ignoring N807).

            Show
            rowen Russell Owen added a comment - - edited I am sure you have done this (and am being lazy about looking at the pull request again) but please be careful of outdated code. I think I modernized all cases where "dunder" free functions were used in afw (eliminating those functions and thus the need for ignoring N807).
            Hide
            bvan Brian Van Klaveren added a comment -

            There were two places it was still needed:

            python/lsst/afw/image/slicing.py
            python/lsst/afw/table/match/matchContinued.py

            Show
            bvan Brian Van Klaveren added a comment - There were two places it was still needed: python/lsst/afw/image/slicing.py python/lsst/afw/table/match/matchContinued.py
            Hide
            bvan Brian Van Klaveren added a comment -

            Merged

            Show
            bvan Brian Van Klaveren added a comment - Merged

              People

              Assignee:
              bvan Brian Van Klaveren
              Reporter:
              tjenness Tim Jenness
              Reviewers:
              Tim Jenness
              Watchers:
              Brian Van Klaveren, Joshua Hoblitt, Kian-Tat Lim, Russell Owen, Tim Jenness, Wil O'Mullane
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.