With major production runs becoming both more common (both at NCSA and in France, as I've recently discovered), I believe we're (re-)developing a need for software packages that describe the pipelines and configuration used in a particular production (as in "Alert Production", "Data Release Production"; I think of different versions of one of these packages would describe particular production runs).
This is of course similar to the old "datarel" package, but the stack has changed considerably since it was in active use and I'm not really proposing that we try to use it as a starting point.
I think we should ultimately have different packages for different productions, including at least AP and DRP and a few different kinds of calibration products productions that happen on different cadences. It's possible HSC may want its own package to describe its data release processing, if that needs to differ from LSST DRP.
With so many packages, I believe it's critical that they be as lightweight as possible. In the SuperTask era, I see these containing lightweight pipeline descriptions (essentially just lists of SuperTask classes to be run and a set of config overrides for them). In the nearer term, as any description of what is run in a production requires significant code to control the high-level workflow (e.g. the pipe_drivers scripts), I think these packages will simply lack this information.
In the very near term, however, they'll give us a place to put configuration files that map Butler data products to more user-friendly database schemas; I think that means we'd want to move the definition of those schemas out of cat and into these per-production packages.
In the medium term, I think we should add an automatic configuration override mechanism (like that in the obs_ packages) that is invoke prior to those in the obs_ packages. This would give us a camera-generic (but production-specific) location for configuration, potentially allowing us to move many common overrides out of the obs* packages and into a shared location.
I do not hope in this RFC to work out all of these details; I think some of them may need a more complete SuperTask design to flesh out. So right now, I'm specifically proposing that we
- create "data_release_production", "alert_production" and "master_calibration_production" packages
- move baseline schema description information from cat to these packages
- add information that relates butler datasets (e.g. deepCoadd_meas) to the baseline schema in a TBD machine-readable format.
- add (probably in a different implementation ticket) config directories and devise a hook to allow CmdLineTask to look up and apply the configuration overrides it holds. This will require an explicit command-line option specifying the production package for now.
I'd be very interested in collecting other ideas for what else these packages might need to hold in the future and any other requirements we'd put on them.