Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-1102

Test Data - Replace Unit-Test Data Package

    Details

    • Type: Epic
    • Status: Won't Fix
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Epic Name:
      Test Data - Replace Unit-Test Data Package

      Description

      The afwdata package is used for many unit tests in afw and meas_algorithms, but I think we've largely stopped using it for new tests. We instead tend to add new data packages or add small test datasets to the packages we're testing - fine solutions in isolation, but a big problem when we do this all the time.

      I think there are several reasons we don't use afwdata more:

      • It's poorly documented.
      • Many of its files are obsolete: they use the old MaskedImage format, or they're outputs from older versions of PhoSim that may not be completely supported by obs_lsstSim anymore.
      • There's no butler to organize the larger, more realistic datasets.
      • There are no truth values for the simulated datasets.

      I think we need to audit the way we use both afwdata and the ad-hoc test data files in individual code packages, gather requirements for other kinds of reusable data for unit tests, and put together a new package that tries to meet those needs without being too large. Eventually I'd like to retire afwdata in favor of this new package entirely, and we might be able to merge it with some other packages that contain test data (obs_test?) as well.

        Attachments

          Issue Links

          Stories in Epic (Custom Issue Matrix)

          Key Summary Story Points Assignee Status
           
          DM-1106

          Convert existing tests to use new package

          30 Unassigned Won't Fix
           
          DM-1105

          Custom mapper for unit test data package

          6 Unassigned Won't Fix
           
          DM-1104

          Generate test data

          10 Unassigned Won't Fix
           
          DM-1103

          Design unit-test data package

          4 Unassigned Won't Fix

            Activity

            Hide
            jbosch Jim Bosch added a comment -

            I see that DM-5532 is deferring unit tests until better test data is available - seems relevant to this issue.

            Show
            jbosch Jim Bosch added a comment - I see that DM-5532 is deferring unit tests until better test data is available - seems relevant to this issue.
            Hide
            wmwood-vasey Michael Wood-Vasey added a comment -

            @jbosch Can you write down 10 example use cases from the algorithmic development side for what a test data repository would need to support.
            https://jira.lsstcorp.org/browse/DM-1102

            I'd like to carefully understand the needs to balance size-of-a-dev-required-package against comprehensiveness-of-tests. In particular, what needs are not met by one of the existing validation_data_* package. I'm not specifically advocating for using one of those packages, but understanding what needs are not met by the data in one of those packages will help design the test data set for algorithmic development.

            Show
            wmwood-vasey Michael Wood-Vasey added a comment - @jbosch Can you write down 10 example use cases from the algorithmic development side for what a test data repository would need to support. https://jira.lsstcorp.org/browse/DM-1102 I'd like to carefully understand the needs to balance size-of-a-dev-required-package against comprehensiveness-of-tests. In particular, what needs are not met by one of the existing validation_data_* package. I'm not specifically advocating for using one of those packages, but understanding what needs are not met by the data in one of those packages will help design the test data set for algorithmic development.
            Hide
            wmwood-vasey Michael Wood-Vasey added a comment -

            I'll start with a few off the top of my head:

            Single-frame measurement:
            1. Image with a few dozen nicely isolated bright stars and a uniform background for testing that things work in a normal case.
            2. Image with a balance of stars, galaxies, and star+galaxies.
            3. Crowed image near the Galactic plane or of a star cluster.
            4. Image with notably varying PSF across the image.
            5. Image with notably varying variance across the image.
            6. Images where getting masking right is key to estimation
            7. Images with high numbers of "cosmic"-ray images; images with low numbers of "cosmic"-ray images.

            Show
            wmwood-vasey Michael Wood-Vasey added a comment - I'll start with a few off the top of my head: Single-frame measurement: 1. Image with a few dozen nicely isolated bright stars and a uniform background for testing that things work in a normal case. 2. Image with a balance of stars, galaxies, and star+galaxies. 3. Crowed image near the Galactic plane or of a star cluster. 4. Image with notably varying PSF across the image. 5. Image with notably varying variance across the image. 6. Images where getting masking right is key to estimation 7. Images with high numbers of "cosmic"-ray images; images with low numbers of "cosmic"-ray images.
            Hide
            swinbank John Swinbank added a comment -

            Following the DM replan and discussions at LSST2017, and notwithstanding my comments above, I suggest a holistic consideration of testing strategy as it pertains to Pipelines should be driven by Pipelines. Jim Bosch & Eric Bellm, let's consider how we want to prioritise this issue as we start planning for S18 over the next couple of months.

            Show
            swinbank John Swinbank added a comment - Following the DM replan and discussions at LSST2017, and notwithstanding my comments above, I suggest a holistic consideration of testing strategy as it pertains to Pipelines should be driven by Pipelines. Jim Bosch & Eric Bellm , let's consider how we want to prioritise this issue as we start planning for S18 over the next couple of months.
            Hide
            swinbank John Swinbank added a comment -

            Closing this epic down, on the basis that I don't see us addressing it as described: this has not been prioritized over the five years since the epic was created. Jim Bosch, please reopen if you disagree.

            Show
            swinbank John Swinbank added a comment - Closing this epic down, on the basis that I don't see us addressing it as described: this has not been prioritized over the five years since the epic was created. Jim Bosch , please reopen if you disagree.

              People

              • Assignee:
                Unassigned
                Reporter:
                jbosch Jim Bosch
                Watchers:
                Jim Bosch, John Parejko, John Swinbank, Michael Wood-Vasey
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel