Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-22769

Please do not strip {{tests/}} from Pipelines Docker images

    Details

    • Team:
      SQuaRE

      Description

      The lsstsqre/centos Docker images are explicitly constructed without the tests directory.

      Unfortunately, the tests for some packages rely on the contents of the tests directory in other packages. For example, when trying to build pipe_tasks against a Dockerized obs_base, I get:

      ____________________________________________ ReadDefectsTestCase.test_read_defects ____________________________________________
      [gw3] linux -- Python 3.7.2 /opt/lsst/software/stack/python/miniconda3-4.7.10/envs/lsst-scipipe-4d7b902/bin/python3.7
       
      self = <test_read_CuratedCalibs.ReadDefectsTestCase testMethod=test_read_defects>
       
          def setUp(self):
      >       butler = dafPersist.ButlerFactory(mapper=BaseMapper()).create()
       
      tests/test_read_CuratedCalibs.py:61: 
      _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
      tests/test_read_CuratedCalibs.py:48: in __init__
          policy = dafPersist.Policy(os.path.join(ROOT, "BaseMapper.yaml"))
      /opt/lsst/software/stack/stack/miniconda3-4.7.10-4d7b902/Linux64/daf_persistence/19.0.0-1-g6fe20d0+1/python/lsst/daf/persistence/policy.py:80: in __init__
          self.__initFromFile(other)
      /opt/lsst/software/stack/stack/miniconda3-4.7.10-4d7b902/Linux64/daf_persistence/19.0.0-1-g6fe20d0+1/python/lsst/daf/persistence/policy.py:111: in __initFromFile
          self.__initFromYamlFile(path)
      _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
       
      self = {}
      path = '/opt/lsst/software/stack/stack/miniconda3-4.7.10-4d7b902/Linux64/obs_base/19.0.0-9-ge91d8c4+1/tests/BaseMapper.yaml'
       
          def __initFromYamlFile(self, path):
              """Opens a file at a given path and attempts to load it in from yaml.
          
              :param path:
              :return:
              """
      >       with open(path, 'r') as f:
      E       FileNotFoundError: [Errno 2] No such file or directory: '/opt/lsst/software/stack/stack/miniconda3-4.7.10-4d7b902/Linux64/obs_base/19.0.0-9-ge91d8c4+1/tests/BaseMapper.yaml'
       
      /opt/lsst/software/stack/stack/miniconda3-4.7.10-4d7b902/Linux64/daf_persistence/19.0.0-1-g6fe20d0+1/python/lsst/daf/persistence/policy.py:145: FileNotFoundError
      

      This is happening because $PIPE_TASKS_DIR/tests/test_read_CuratedCalibs.py depends upon $OBS_BASE_DIR/tests/BaseMapper.yaml, which has been removed from the Docker images.

      This renders the Docker images much less useful for development than they might otherwise be.

      I don't know what the original motivation for stripping tests was (just to save space?). In general, I'd suggest that the Docker images should contain exactly the contents of the packages published at eups.lsst.codes — if it's appropriate to strip something from the Docker image, it must be appropriate to strip it from the package, and vice versa. Please stop special-casing this directory in image construction.

      Adding Josh and Simon as watchers here, as respectively the author of the Docker image building code and the tests that are being broken.

        Attachments

          Issue Links

            Activity

            Hide
            swinbank John Swinbank added a comment -

            Maybe worth adding that a goodly chunk of that seems to be the results of executing the tests, rather than the tests and associated inputs (e.g. there's about 250MB in $PIPE_TASKS_DIR/tests after executing the tests, but only 14MB before). I'm a bit more on the fence about whether we can strip the test outputs.

            Show
            swinbank John Swinbank added a comment - Maybe worth adding that a goodly chunk of that seems to be the results of executing the tests, rather than the tests and associated inputs (e.g. there's about 250MB in $PIPE_TASKS_DIR/tests after executing the tests, but only 14MB before). I'm a bit more on the fence about whether we can strip the test outputs.
            Hide
            tjenness Tim Jenness added a comment -

            Installing output data from tests is one of the things I fixed for lsst_ci in DM-22305 so we may have to make the same fixes to pipe_tasks.

            We do install all sorts of things in our eups installs that are completely unnecessary for a non-developer binary distribution. The tests/.tests directories should not be distributed.

            Show
            tjenness Tim Jenness added a comment - Installing output data from tests is one of the things I fixed for lsst_ci in DM-22305 so we may have to make the same fixes to pipe_tasks. We do install all sorts of things in our eups installs that are completely unnecessary for a non-developer binary distribution. The tests/.tests directories should not be distributed.
            Hide
            tjenness Tim Jenness added a comment -

            I've just done a fresh build (with some contamination from a sims build) and I get about 400MB of tests directories now. About 150MB is in .tests directories. obs_base is the winning package with nearly 80MB of data in it (half of that is for one test file and I'm not sure that file needs to be anything more than a few kB). afw has about 20MB of tests and 20MB of .tests. Deleting .tests and being a bit more careful with test files that don't need to be as big as they are could probably get us below 200MB of test files.

            Show
            tjenness Tim Jenness added a comment - I've just done a fresh build (with some contamination from a sims build) and I get about 400MB of tests directories now. About 150MB is in .tests directories. obs_base is the winning package with nearly 80MB of data in it (half of that is for one test file and I'm not sure that file needs to be anything more than a few kB). afw has about 20MB of tests and 20MB of .tests. Deleting .tests and being a bit more careful with test files that don't need to be as big as they are could probably get us below 200MB of test files.
            Hide
            swinbank John Swinbank added a comment -

            Thanks Tim!

            Other than “smaller is better”, do we actually know what we're aiming for here?

            Show
            swinbank John Swinbank added a comment - Thanks Tim! Other than “smaller is better”, do we actually know what we're aiming for here?
            Hide
            jhoblitt Joshua Hoblitt added a comment - - edited

            AFAIK – docker hub does not publish a maximum size limit for images.  As we know that docker hub is using aws s3 to distribute layers, it seems probable that the maximum s3 object size of 5GiB will apply to the docker (compressed) layer size.  These docker images are already extraordinarily large and are fairly slow to download and uncompress.  I would rather see the image size going down rather than up as the downstream jupyter notebooks are layering on many more gigabytes.

            Show
            jhoblitt Joshua Hoblitt added a comment - - edited AFAIK – docker hub does not publish a maximum size limit for images.  As we know that docker hub is using aws s3 to distribute layers, it seems probable that the maximum s3 object size of 5GiB will apply to the docker (compressed) layer size.  These docker images are already extraordinarily large and are fairly slow to download and uncompress.  I would rather see the image size going down rather than up as the downstream jupyter notebooks are layering on many more gigabytes.

              People

              • Assignee:
                frossie Frossie Economou
                Reporter:
                swinbank John Swinbank
                Watchers:
                John Swinbank, Joshua Hoblitt, Simon Krughoff, Tim Jenness
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated: