The direction the code is going is a good one; in particular, the repository of repository configs is a good primitive for this and similar use cases. But there are still substantial portions of this code (some unrelated to the ticket itself) that do not feel like they yet form a releasable feature set that we could announce in release notes. I've been thinking a bit about how to deal with complex, multi-part, interdependent developments like this with the goal of making sure that users are not disrupted while new, not-quite-ready features are being built, and I think it comes down to two alternatives:
- Merging to a long-lived integration branch that is not master
- Merging to master with a "version switch" triggered by (in this case) a Butler construction argument or perhaps an environment variable that only enables the new interfaces and implementation when explicitly requested.
If interface changes are extensive and could be frequent due to uncertainty and evolution, then the first is probably preferable to minimize disruption to dependent package users until the interface is firmed up. If interface changes are expected to be minimal and infrequent because the interface is well-defined, then the second could be acceptable and would help with the eventual merge since dependent package users could help maintain compatibility with both versions while making unrelated changes.
One of the things I worry about here is that the current lack of definition around Access and Storage (and the entire plugin serialization model) means that repository configurations are still unstable. The current code appears to expose this in both construction of Repositories (and hence Butlers) and in the persisted configuration files (which do not appear to have explicit code for dealing with evolution). On the other hand, we may be able to present an external interface that hides all of this complexity by providing a normal use case with pre-existing or internally-generated configurations (unlike the code-based example in LDM-463 for this ticket), in which case a version switch and "don't look behind the curtain" could be acceptable. (Note that modifications to existing configurations via code or manual overrides will become a normal use case in the future, so that interface does need to be fully defined and exposed.)
So before anything is merged to master, I would like the following to take place:
- Decide which of the above strategies is to be used and implement it.
- Work through a complete example of how this primitive can be deployed in a particular use case such as the multi-version, date-range-based master calibration image repository and incorporate that into LDM-463.
- Deal with any minor code comments that I expect to make in the PR later today.
Consult https://jira.lsstcorp.org/browse/RFC-95 (search for “version”) and read down from there about how they want to configure & specify repository roots. they talk about rerun a lot and that’s captured here. But not captured is:
there may be multiple versions of a repository (like data release 1 and data release 2). Users need to be able to select easily between them.
Also, want to be able to select different versions of different reference catalogs using the butler (right now they are selected thru EUPS).