Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-11792

Investigate loadscope test distribution option

    Details

    • Type: Story
    • Status: To Do
    • Resolution: Unresolved
    • Fix Version/s: None
    • Component/s: pipe_tasks, sconsUtils
    • Labels:
      None
    • Story Points:
      1
    • Team:
      Architecture

      Description

      Some tests have a high setup overhead which makes them incompatible with distributed testing. An example of this is the tests in pipe_tasks that create 200MB of test data and then check it. If xdist is used 200MB of data are created per process and this makes the tests significantly slower and can cause resource exhaustion.

      The ideal example would be to use a fixture to mark a class to be fixed to a node. There is an open issue on that on GitHub but for now the developers recommend running with --dist=loadscope to ensure that all classes for all tests are pinned to their own nodes.

        Attachments

          Issue Links

            Activity

            Hide
            tjenness Tim Jenness added a comment -

            The good news is that -dist=loadscope does work if I move the setup_module code to setUpClass. I can in theory add -dist=loadscope to the setup.cfg file so that this behavior always works with pipe_tasks when someone runs pytest. The one wrinkle is that I also need to to ensure that people can type pytest and the right thing happens. This means that setup.cfg needs to add the right options by default and currently this means that you always need a -n 1 in the config file which adds a little overhead to the one process case.

            The other wrinkle is that there are problems with node counts and flake8:

            $ pytest -v --dist=loadscope -n 2 --flake8 --fulltrace  tests/test_register.py 
            ============================= test session starts ==============================
            platform darwin -- Python 3.6.0, pytest-3.2.0, py-1.4.34, pluggy-0.4.0 -- /Users/timj/work/lsstsw3/miniconda/bin/python
            cachedir: .cache
            rootdir: /Volumes/G-RAID with Thunderbolt/transient/lsstsw3/build/pipe_tasks, inifile: setup.cfg
            plugins: session2file-0.1.9, forked-0.3.dev0+g1dd93f6.d20170815, xdist-1.19.2.dev0+g459d52e.d20170815, flake8-0.8.1
            [gw0] darwin Python 3.6.0 cwd: /Volumes/G-RAID with Thunderbolt/transient/lsstsw3/build/pipe_tasks
            [gw1] darwin Python 3.6.0 cwd: /Volumes/G-RAID with Thunderbolt/transient/lsstsw3/build/pipe_tasks
            [gw0] Python 3.6.0 |Continuum Analytics, Inc.| (default, Dec 23 2016, 13:19:00)  -- [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
            [gw1] Python 3.6.0 |Continuum Analytics, Inc.| (default, Dec 23 2016, 13:19:00)  -- [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
            gw0 [5] / gw1 [5]
            scheduling tests via LoadScopeScheduling
             
            tests/test_register.py::RegisterTestCase::testRegister 
            [gw1] PASSED tests/test_register.py::RegisterTestCase::testRegister 
            tests/test_register.py::RegisterTestCase::testRejection 
            [gw1] PASSED tests/test_register.py::RegisterTestCase::testRejection 
            tests/test_register.py::MyMemoryTestCase::testFileDescriptorLeaks <- ../../../../../../Users/timj/work/lsstsw3/stack/DarwinX86/utils/13.0-9-gf29e843+2/python/lsst/utils/tests.py 
            [gw1] PASSED tests/test_register.py::MyMemoryTestCase::testFileDescriptorLeaks <- ../../../../../../Users/timj/work/lsstsw3/stack/DarwinX86/utils/13.0-9-gf29e843+2/python/lsst/utils/tests.py 
            tests/test_register.py::MyMemoryTestCase::testLeaks <- ../../../../../../Users/timj/work/lsstsw3/stack/DarwinX86/utils/13.0-9-gf29e843+2/python/lsst/utils/tests.py 
            tests/test_register.py 
            [gw1] PASSED tests/test_register.py::MyMemoryTestCase::testLeaks <- ../../../../../../Users/timj/work/lsstsw3/stack/DarwinX86/utils/13.0-9-gf29e843+2/python/lsst/utils/tests.py 
            [gw0] FAILED tests/test_register.py 
             
            =================================== FAILURES ===================================
            _______________ FLAKE8-check(ignoring E133 E226 E228 N802 N803) ________________
            [gw0] darwin -- Python 3.6.0 /Users/timj/work/lsstsw3/miniconda/bin/python
            /Volumes/G-RAID with Thunderbolt/transient/lsstsw3/build/pipe_tasks/tests/test_register.py:4:1: E265 block comment should start with '# '
             
            ===================== 1 failed, 4 passed in 10.99 seconds ======================
            

            whereas this hangs:

            $ pytest -v --dist=loadscope -n 1 --flake8 --fulltrace  tests/test_register.py 
            ============================= test session starts ==============================
            platform darwin -- Python 3.6.0, pytest-3.2.0, py-1.4.34, pluggy-0.4.0 -- /Users/timj/work/lsstsw3/miniconda/bin/python
            cachedir: .cache
            rootdir: /Volumes/G-RAID with Thunderbolt/transient/lsstsw3/build/pipe_tasks, inifile: setup.cfg
            plugins: session2file-0.1.9, forked-0.3.dev0+g1dd93f6.d20170815, xdist-1.19.2.dev0+g459d52e.d20170815, flake8-0.8.1
            [gw0] darwin Python 3.6.0 cwd: /Volumes/G-RAID with Thunderbolt/transient/lsstsw3/build/pipe_tasks
            [gw0] Python 3.6.0 |Continuum Analytics, Inc.| (default, Dec 23 2016, 13:19:00)  -- [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
            gw0 [5]
            scheduling tests via LoadScopeScheduling
            

            seemingly trying to contact processes. It works if --flake8 is removed.

            $ pytest -v --dist=loadscope -n 1 --fulltrace  tests/test_register.py 
            ============================= test session starts ==============================
            platform darwin -- Python 3.6.0, pytest-3.2.0, py-1.4.34, pluggy-0.4.0 -- /Users/timj/work/lsstsw3/miniconda/bin/python
            cachedir: .cache
            rootdir: /Volumes/G-RAID with Thunderbolt/transient/lsstsw3/build/pipe_tasks, inifile: setup.cfg
            plugins: session2file-0.1.9, forked-0.3.dev0+g1dd93f6.d20170815, xdist-1.19.2.dev0+g459d52e.d20170815, flake8-0.8.1
            [gw0] darwin Python 3.6.0 cwd: /Volumes/G-RAID with Thunderbolt/transient/lsstsw3/build/pipe_tasks
            [gw0] Python 3.6.0 |Continuum Analytics, Inc.| (default, Dec 23 2016, 13:19:00)  -- [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
            gw0 [4]
            scheduling tests via LoadScopeScheduling
             
            tests/test_register.py::RegisterTestCase::testRegister 
            [gw0] PASSED tests/test_register.py::RegisterTestCase::testRegister 
            tests/test_register.py::RegisterTestCase::testRejection 
            [gw0] PASSED tests/test_register.py::RegisterTestCase::testRejection 
            tests/test_register.py::MyMemoryTestCase::testFileDescriptorLeaks <- ../../../../../../Users/timj/work/lsstsw3/stack/DarwinX86/utils/13.0-9-gf29e843+2/python/lsst/utils/tests.py 
            [gw0] PASSED tests/test_register.py::MyMemoryTestCase::testFileDescriptorLeaks <- ../../../../../../Users/timj/work/lsstsw3/stack/DarwinX86/utils/13.0-9-gf29e843+2/python/lsst/utils/tests.py 
            tests/test_register.py::MyMemoryTestCase::testLeaks <- ../../../../../../Users/timj/work/lsstsw3/stack/DarwinX86/utils/13.0-9-gf29e843+2/python/lsst/utils/tests.py 
            [gw0] PASSED tests/test_register.py::MyMemoryTestCase::testLeaks <- ../../../../../../Users/timj/work/lsstsw3/stack/DarwinX86/utils/13.0-9-gf29e843+2/python/lsst/utils/tests.py 
             
            ========================== 4 passed in 10.25 seconds ===========================
            

            So I think this may be a manifestation of the pytest-flake8 plugin not registering itself properly with pytest. It is known to interact badly with pytest-random-order. On the 2 subprocess successful run the flake8 test is put on its own process.

            Show
            tjenness Tim Jenness added a comment - The good news is that - dist=loadscope does work if I move the setup_module code to setUpClass . I can in theory add -dist=loadscope to the setup.cfg file so that this behavior always works with pipe_tasks when someone runs pytest . The one wrinkle is that I also need to to ensure that people can type pytest and the right thing happens. This means that setup.cfg needs to add the right options by default and currently this means that you always need a -n 1 in the config file which adds a little overhead to the one process case. The other wrinkle is that there are problems with node counts and flake8: $ pytest -v --dist=loadscope -n 2 --flake8 --fulltrace tests/test_register.py ============================= test session starts ============================== platform darwin -- Python 3.6.0, pytest-3.2.0, py-1.4.34, pluggy-0.4.0 -- /Users/timj/work/lsstsw3/miniconda/bin/python cachedir: .cache rootdir: /Volumes/G-RAID with Thunderbolt/transient/lsstsw3/build/pipe_tasks, inifile: setup.cfg plugins: session2file-0.1.9, forked-0.3.dev0+g1dd93f6.d20170815, xdist-1.19.2.dev0+g459d52e.d20170815, flake8-0.8.1 [gw0] darwin Python 3.6.0 cwd: /Volumes/G-RAID with Thunderbolt/transient/lsstsw3/build/pipe_tasks [gw1] darwin Python 3.6.0 cwd: /Volumes/G-RAID with Thunderbolt/transient/lsstsw3/build/pipe_tasks [gw0] Python 3.6.0 |Continuum Analytics, Inc.| (default, Dec 23 2016, 13:19:00) -- [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] [gw1] Python 3.6.0 |Continuum Analytics, Inc.| (default, Dec 23 2016, 13:19:00) -- [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] gw0 [5] / gw1 [5] scheduling tests via LoadScopeScheduling   tests/test_register.py::RegisterTestCase::testRegister [gw1] PASSED tests/test_register.py::RegisterTestCase::testRegister tests/test_register.py::RegisterTestCase::testRejection [gw1] PASSED tests/test_register.py::RegisterTestCase::testRejection tests/test_register.py::MyMemoryTestCase::testFileDescriptorLeaks <- ../../../../../../Users/timj/work/lsstsw3/stack/DarwinX86/utils/13.0-9-gf29e843+2/python/lsst/utils/tests.py [gw1] PASSED tests/test_register.py::MyMemoryTestCase::testFileDescriptorLeaks <- ../../../../../../Users/timj/work/lsstsw3/stack/DarwinX86/utils/13.0-9-gf29e843+2/python/lsst/utils/tests.py tests/test_register.py::MyMemoryTestCase::testLeaks <- ../../../../../../Users/timj/work/lsstsw3/stack/DarwinX86/utils/13.0-9-gf29e843+2/python/lsst/utils/tests.py tests/test_register.py [gw1] PASSED tests/test_register.py::MyMemoryTestCase::testLeaks <- ../../../../../../Users/timj/work/lsstsw3/stack/DarwinX86/utils/13.0-9-gf29e843+2/python/lsst/utils/tests.py [gw0] FAILED tests/test_register.py   =================================== FAILURES =================================== _______________ FLAKE8-check(ignoring E133 E226 E228 N802 N803) ________________ [gw0] darwin -- Python 3.6.0 /Users/timj/work/lsstsw3/miniconda/bin/python /Volumes/G-RAID with Thunderbolt/transient/lsstsw3/build/pipe_tasks/tests/test_register.py:4:1: E265 block comment should start with '# '   ===================== 1 failed, 4 passed in 10.99 seconds ====================== whereas this hangs: $ pytest -v --dist=loadscope -n 1 --flake8 --fulltrace tests/test_register.py ============================= test session starts ============================== platform darwin -- Python 3.6.0, pytest-3.2.0, py-1.4.34, pluggy-0.4.0 -- /Users/timj/work/lsstsw3/miniconda/bin/python cachedir: .cache rootdir: /Volumes/G-RAID with Thunderbolt/transient/lsstsw3/build/pipe_tasks, inifile: setup.cfg plugins: session2file-0.1.9, forked-0.3.dev0+g1dd93f6.d20170815, xdist-1.19.2.dev0+g459d52e.d20170815, flake8-0.8.1 [gw0] darwin Python 3.6.0 cwd: /Volumes/G-RAID with Thunderbolt/transient/lsstsw3/build/pipe_tasks [gw0] Python 3.6.0 |Continuum Analytics, Inc.| (default, Dec 23 2016, 13:19:00) -- [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] gw0 [5] scheduling tests via LoadScopeScheduling seemingly trying to contact processes. It works if --flake8 is removed. $ pytest -v --dist=loadscope -n 1 --fulltrace tests/test_register.py ============================= test session starts ============================== platform darwin -- Python 3.6.0, pytest-3.2.0, py-1.4.34, pluggy-0.4.0 -- /Users/timj/work/lsstsw3/miniconda/bin/python cachedir: .cache rootdir: /Volumes/G-RAID with Thunderbolt/transient/lsstsw3/build/pipe_tasks, inifile: setup.cfg plugins: session2file-0.1.9, forked-0.3.dev0+g1dd93f6.d20170815, xdist-1.19.2.dev0+g459d52e.d20170815, flake8-0.8.1 [gw0] darwin Python 3.6.0 cwd: /Volumes/G-RAID with Thunderbolt/transient/lsstsw3/build/pipe_tasks [gw0] Python 3.6.0 |Continuum Analytics, Inc.| (default, Dec 23 2016, 13:19:00) -- [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] gw0 [4] scheduling tests via LoadScopeScheduling   tests/test_register.py::RegisterTestCase::testRegister [gw0] PASSED tests/test_register.py::RegisterTestCase::testRegister tests/test_register.py::RegisterTestCase::testRejection [gw0] PASSED tests/test_register.py::RegisterTestCase::testRejection tests/test_register.py::MyMemoryTestCase::testFileDescriptorLeaks <- ../../../../../../Users/timj/work/lsstsw3/stack/DarwinX86/utils/13.0-9-gf29e843+2/python/lsst/utils/tests.py [gw0] PASSED tests/test_register.py::MyMemoryTestCase::testFileDescriptorLeaks <- ../../../../../../Users/timj/work/lsstsw3/stack/DarwinX86/utils/13.0-9-gf29e843+2/python/lsst/utils/tests.py tests/test_register.py::MyMemoryTestCase::testLeaks <- ../../../../../../Users/timj/work/lsstsw3/stack/DarwinX86/utils/13.0-9-gf29e843+2/python/lsst/utils/tests.py [gw0] PASSED tests/test_register.py::MyMemoryTestCase::testLeaks <- ../../../../../../Users/timj/work/lsstsw3/stack/DarwinX86/utils/13.0-9-gf29e843+2/python/lsst/utils/tests.py   ========================== 4 passed in 10.25 seconds =========================== So I think this may be a manifestation of the pytest-flake8 plugin not registering itself properly with pytest . It is known to interact badly with pytest-random-order . On the 2 subprocess successful run the flake8 test is put on its own process.

              People

              • Assignee:
                tjenness Tim Jenness
                Reporter:
                tjenness Tim Jenness
                Watchers:
                Russell Owen, Tim Jenness
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Summary Panel