Tim Jenness, this isn't quite done (more on that below), but the vast majority of it is ready for review, and as requested I'll try to describe below what's left to do well enough that someone else could take it over (though I think reviewing what's in daf_butler at least should probably happen first, and that will take a while on its own).
There are four intertwined things happening in daf_butler here:
- Refactor the implementation of collections in the Registry to the Manager+Records objects model described on the prototype page.
- Fundamentally change what a "run" is, as described on
RFC-663 (also mentioned on the prototype page, but the RFC-663 is newer and fleshes out some details).
- Add a new CHAINED collection type, as described on
- Make it so Butler and Registry searches can handle multiple collections at once, moving that logic out of ctrl_mpexec/pipe_base (work originally planned for
From the perspective of daf_butler alone, each of these could have been done on a separate tickets, but they each represent a different disruptive change to downstream package (especially ctrl_mpexec), and I didn't want to go through four of those. So while the commits in daf_butler are somewhat split into separate ranges of commits (see PR), the changes to downstream packages were made to reflect only the final state of daf_butler. Those changes are trivial in most packages, but the changes in obs_base (gen2to3) and ctrl_mpexec (changes to command-line arguments involving collections, as per
RFC-663) are not, though those are still < 10% of the daf_butler changes.
The final status, by package:
- daf_butler: probably done enough to merge once other packages are working with it. Ideally we'd add some unit tests specifically targeting the new code in wildcards.py - that gets a lot of free coverage from usage in higher-level, well-tested code, but it's entirely possible I missed some edge cases. But that's the kind of thing that wouldn't be terrible to defer to another ticket in order to move this one along. As noted above, the correspondence between commits and features will be documented on the PR.
- obs_base: will conflict with John Parejko's
DM-22655 but I'm pretty confident I can resolve those without much trouble when the time comes, given that I know a fair bit about both. The big change here is to always ingest directly into a RUN collection (updating config names to reflect that), and then define a CHAINED collection that maps better the parent-child repo linkage in Gen2, and avoid TAGGED collections entirely. These changes also mean that if DM-22655 lands first, this ticket will probably need an obs_decam branch similar to the obs_subaru one to update some config overrides.
- ctrl_mpexec: passes tests locally, but not done. See the message on the last git commit for more, but it mostly needs more tests and some API docs for new classes. Real testing will require running ci_hsc_gen3, which I have not tried to do, and which I expect to require (trivial) changes to the command-line arguments in the shell script that invokes pipetask.
- pipe_base: trivial changes to GraphBuilder to reflect the downstream changes in how we pass collections in when building QuantumGraphs.
- obs_subaru: trivial config changes to adapt to obs_base changes.
- ci_hsc_gen2: trivial changes to adapt to daf_butler and obs_base changes.