Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-5120

Add intelligence to `validate_drp` so it does "A Reasonable Thing" on an unknown output repo

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: Validation
    • Labels:
      None

      Description

      validate_drp current takes as input both a repository and a configuration file. The configuration file contains information to construct the list of dataIds to analyze.

      However, these dataIds could be extracted from the repo itself, in cases where the desired is to analyze the entire repo.

      1. Add a function that loads the set of dataIds from the repo.
      2. Select reasonable defaults for the additional parameters specified in the config file.
      3. Design how to handle multiple filters.

        Attachments

          Issue Links

            Activity

            Hide
            wmwood-vasey Michael Wood-Vasey added a comment -

            Let's continue this discussion at the JTM and over the next few weeks. I'll capture the discussion to date in a future ticket or a community.lsst.org discussion. I'll ask for this capability to be reviewed and merged in for now, with this caveat that it fails as one approaches scales of ~1000 catalogs in mind.

            Show
            wmwood-vasey Michael Wood-Vasey added a comment - Let's continue this discussion at the JTM and over the next few weeks. I'll capture the discussion to date in a future ticket or a community.lsst.org discussion. I'll ask for this capability to be reviewed and merged in for now, with this caveat that it fails as one approaches scales of ~1000 catalogs in mind.
            Hide
            wmwood-vasey Michael Wood-Vasey added a comment - - edited

            Changes from DM-5121 are pretty minor. I anticipate this is a quick review.

            https://github.com/lsst/validate_drp/pull/10

            You might find it interesting to run this on a small bit of HSC data or something else sitting around. I've verified the new no-configuration file approach works on the example data from `ci_hsc`.

            Passes Jenkins.
            (but there's no full functionality test, so this means less than it should).
            https://ci.lsst.codes/job/stack-os-matrix/8459/label=centos-6/console

            Passes my analyses with the CFHT and DECam example data sets and my running of `ci_hsc`.

            validate_drp now will analyze all of the available and calibrated dataIds in a repository with no need for explicit specification in the configuration.

            After processing CFHT example data:
            ```
            validateDrp.py Cfht/output
            ```

            Or after running `ci_hsc` tests:
            ```
            validateDrp.py DATA
            ```

            One can do any of
            1. use no configuration file (as above)
            2. pass a configuration file with just validation parameters (good_mag_limit, number of expected matches, ...) but no dataId specifications
            3. pass a configuration file that specifies validation parameters and the dataIds to process.

            It fails on lsst-dev when trying to merge ~1000 catalogs. This is unfortunate, but not completely surprising and out of scope to solve for this ticket. One can still control the dataIds processed by specifying the details a `--configFile`.

            Show
            wmwood-vasey Michael Wood-Vasey added a comment - - edited Changes from DM-5121 are pretty minor. I anticipate this is a quick review. https://github.com/lsst/validate_drp/pull/10 You might find it interesting to run this on a small bit of HSC data or something else sitting around. I've verified the new no-configuration file approach works on the example data from `ci_hsc`. Passes Jenkins. (but there's no full functionality test, so this means less than it should). https://ci.lsst.codes/job/stack-os-matrix/8459/label=centos-6/console Passes my analyses with the CFHT and DECam example data sets and my running of `ci_hsc`. validate_drp now will analyze all of the available and calibrated dataIds in a repository with no need for explicit specification in the configuration. After processing CFHT example data: ``` validateDrp.py Cfht/output ``` Or after running `ci_hsc` tests: ``` validateDrp.py DATA ``` One can do any of 1. use no configuration file (as above) 2. pass a configuration file with just validation parameters (good_mag_limit, number of expected matches, ...) but no dataId specifications 3. pass a configuration file that specifies validation parameters and the dataIds to process. It fails on lsst-dev when trying to merge ~1000 catalogs. This is unfortunate, but not completely surprising and out of scope to solve for this ticket. One can still control the dataIds processed by specifying the details a `--configFile`.
            Hide
            ktl Kian-Tat Lim added a comment - - edited

            Michael Wood-Vasey I'm not sure how you're going to express a 1000-way cross-join, even one that is spatially-limited, in SQL. I guess you could do a self-join on the all-visit Source table with exclusion of same-visit matches. Source-to-Source self-joins are not currently expected to be supported in Qserv. Optimizing them to finish in any reasonable amount of time in any relational database will likely be a challenge.

            Show
            ktl Kian-Tat Lim added a comment - - edited Michael Wood-Vasey I'm not sure how you're going to express a 1000-way cross-join, even one that is spatially-limited, in SQL. I guess you could do a self-join on the all-visit Source table with exclusion of same-visit matches. Source-to-Source self-joins are not currently expected to be supported in Qserv. Optimizing them to finish in any reasonable amount of time in any relational database will likely be a challenge.
            Hide
            swinbank John Swinbank added a comment -

            Sorry this review took so long to happen. Minor comments on the PR, but nothing very substantial. Good to merge.

            Show
            swinbank John Swinbank added a comment - Sorry this review took so long to happen. Minor comments on the PR, but nothing very substantial. Good to merge.
            Hide
            wmwood-vasey Michael Wood-Vasey added a comment -

            Merged to master.

            Show
            wmwood-vasey Michael Wood-Vasey added a comment - Merged to master.

              People

              • Assignee:
                wmwood-vasey Michael Wood-Vasey
                Reporter:
                wmwood-vasey Michael Wood-Vasey
                Reviewers:
                John Swinbank
                Watchers:
                Angelo Fausti, Greg Daues, Hsin-Fang Chiang, John Swinbank, Kian-Tat Lim, Michael Wood-Vasey
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Due:
                  Created:
                  Updated:
                  Resolved: