Details
-
Type:
RFC
-
Status: Implemented
-
Resolution: Done
-
Component/s: DM
-
Labels:None
Description
The Gen3 butler followed its predecessor in taking responsibility for packing data IDs of certain types deterministically and reversibly into integer IDs (and the number of bits those IDs require), which our code uses both to mangle into integer source IDs and as random number seeds. The implementation is primarily in the DimensionPacker classes in daf_butler, which also has some derived classes in the skymap package. There is also logic in astro_metadata_translator to compute a detector_exposure_id integer, which is kept consistent with the DimensionPacker implementation for the same combination only via constant vigilance.
For these kinds of packers to work, they need to know the ranges of values various dimensions can take, and that is often problematic. For example:
- We've baked in some pretty arbitrary guesses at maximum values for visit and exposure IDs into the instrument dimension records that are stored in the butler (making them quite hard to change).
- The packer for (tract, patch, band) data IDs lives in skymap, and that needs to know - in advance, hard-coded - the full list of all bands anyone might ever use. That's clearly not practical, and while I intend to just keep extending that list as needed for now (
DM-29944is what prompted this RFC), it's just a bad situation to be in.
Moreover, because the state for these packers (e.g. maximum values) is either in the data repository or fixed in the code, we cannot change it without breaking our ability to unpack IDs in existing on-disk datasets.
To deal with this, I propose that we:
- Create new data ID packer classes whose state is provided exclusively by pex_config Configuration. Task code that wishes to use these would nest the packer's configuration within their own, and obs_ package configs and pipeline definitions would be used to provide instrument- or dataset-specific defaults for those ranges of allowable values to task-runners similar levels of convenience.
- The DimensionPacker hierarchy in daf_butler and the APIs that access it would be deprecated and removed (deprecation only after the new classes are available; removal a release cycle later, as usual).
VisitInfo.getExposureId (or any postRFC-459 accessor for a combined visit+detector or exposure+detector integer ID) will be deprecated and ultimately removed; algorithm code that wants such an ID should instead get it via another parameter that can be met via the new configuration-based packers.VisitInfo.getExposureId will move to Exposure.id, and will be an opaque ID appropriate for just that image, usually generated in a way that is consistent with one of the new config-driven ID packers (hopefully populated by actually using them, at least most of the time). VisitInfo.id will be created as an opaque ID for either the exposure or visit ID recognized by the butler (appropriately for what that image represents).- astro_metadata_translator.ObservationInfo.detector_exposure_id will be deprecated and ultimately removed.
In addition, the instrument dimension columns exposure_max and visit_max may be removed in a future schema migration (but this would be accompanied by its own RFC, like any other migration).
Attachments
Issue Links
- is triggering
-
DM-31924 Design and implement configurable data ID packing system
- Done
- relates to
-
RFC-917 Data ID -> integer packing for Rubin instruments and related code removals
- Implemented
-
DM-38687 Remove code deprecated on DM-31924
- To Do
-
DM-29944 Add some narrow-band filters to skymap's tract+patch+band data ID packers
- Done
-
DM-13944 add id to VisitInfo
- Done
-
RFC-459 Remove exposureId from VisitInfo and add visitId
- Implemented
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
This is a good way to frame the problem; one way to put my concern is that I fear we are coupling Exposures with an identifier that is only associated with that object in a particular context (a particular pipeline configuration with a particular ID-packing algorithm) that is implicit, or worse, the identifier is actually associated with some other related object (e.g. the raw this calexp was built from) and isn't appropriate for that Exposure at all. I'm only slightly bothered by not coupling related things, and much more bothered by coupling unrelated things.
But I think the only uses of detector_exposure_id are via the badly-named number it stuff in VisitInfo, and I don't actually know what uses that.
If the answer doesn't include "generate source and object IDs", then I have less of a problem with it; as I mentioned in my last post, it's the "how many bits could this consume" question that brings in the need for extra state/context.