Fix Version/s: None
validate_drp current takes as input both a repository and a configuration file. The configuration file contains information to construct the list of dataIds to analyze.
However, these dataIds could be extracted from the repo itself, in cases where the desired is to analyze the entire repo.
1. Add a function that loads the set of dataIds from the repo.
2. Select reasonable defaults for the additional parameters specified in the config file.
3. Design how to handle multiple filters.
- is blocked by
DM-5121 Add multiple-filter capabilities to `validate_drp`
DM-5121 are pretty minor. I anticipate this is a quick review.
You might find it interesting to run this on a small bit of HSC data or something else sitting around. I've verified the new no-configuration file approach works on the example data from `ci_hsc`.
(but there's no full functionality test, so this means less than it should).
Passes my analyses with the CFHT and DECam example data sets and my running of `ci_hsc`.
validate_drp now will analyze all of the available and calibrated dataIds in a repository with no need for explicit specification in the configuration.
After processing CFHT example data:
Or after running `ci_hsc` tests:
One can do any of
1. use no configuration file (as above)
2. pass a configuration file with just validation parameters (good_mag_limit, number of expected matches, ...) but no dataId specifications
3. pass a configuration file that specifies validation parameters and the dataIds to process.
It fails on lsst-dev when trying to merge ~1000 catalogs. This is unfortunate, but not completely surprising and out of scope to solve for this ticket. One can still control the dataIds processed by specifying the details a `--configFile`.
Michael Wood-Vasey I'm not sure how you're going to express a 1000-way cross-join, even one that is spatially-limited, in SQL. I guess you could do a self-join on the all-visit Source table with exclusion of same-visit matches. Source-to-Source self-joins are not currently expected to be supported in Qserv. Optimizing them to finish in any reasonable amount of time in any relational database will likely be a challenge.
Sorry this review took so long to happen. Minor comments on the PR, but nothing very substantial. Good to merge.
Let's continue this discussion at the JTM and over the next few weeks. I'll capture the discussion to date in a future ticket or a community.lsst.org discussion. I'll ask for this capability to be reviewed and merged in for now, with this caveat that it fails as one approaches scales of ~1000 catalogs in mind.