Uploaded image for project: 'Request For Comments'
  1. Request For Comments
  2. RFC-785

Move data ID -> integer packing state and logic out of butler



    • RFC
    • Status: Implemented
    • Resolution: Done
    • DM
    • None


      The Gen3 butler followed its predecessor in taking responsibility for packing data IDs of certain types deterministically and reversibly into integer IDs (and the number of bits those IDs require), which our code uses both to mangle into integer source IDs and as random number seeds.  The implementation is primarily in the DimensionPacker classes in daf_butler, which also has some derived classes in the skymap package. There is also logic in astro_metadata_translator to compute a detector_exposure_id integer, which is kept consistent with the DimensionPacker implementation for the same combination only via constant vigilance.

      For these kinds of packers to work, they need to know the ranges of values various dimensions can take, and that is often problematic.  For example:

      • We've baked in some pretty arbitrary guesses at maximum values for visit and exposure IDs into the instrument dimension records that are stored in the butler (making them quite hard to change).
      • The packer for (tract, patch, band) data IDs lives in skymap, and that needs to know - in advance, hard-coded - the full list of all bands anyone might ever use.  That's clearly not practical, and while I intend to just keep extending that list as needed for now (DM-29944 is what prompted this RFC), it's just a bad situation to be in.

      Moreover, because the state for these packers (e.g. maximum values) is either in the data repository or fixed in the code, we cannot change it without breaking our ability to unpack IDs in existing on-disk datasets.

      To deal with this, I propose that we:

      1. Create new data ID packer classes whose state is provided exclusively by pex_config Configuration.  Task code that wishes to use these would nest the packer's configuration within their own, and obs_ package configs and pipeline definitions would be used to provide instrument- or dataset-specific defaults for those ranges of allowable values to task-runners similar levels of convenience.
      2. The DimensionPacker hierarchy in daf_butler and the APIs that access it would be deprecated and removed (deprecation only after the new classes are available; removal a release cycle later, as usual).
      3. VisitInfo.getExposureId (or any postRFC-459 accessor for a combined visit+detector or exposure+detector integer ID) will be deprecated and ultimately removed; algorithm code that wants such an ID should instead get it via another parameter that can be met via the new configuration-based packers. VisitInfo.getExposureId will move to Exposure.id, and will be an opaque ID appropriate for just that image, usually generated in a way that is consistent with one of the new config-driven ID packers (hopefully populated by actually using them, at least most of the time). VisitInfo.id will be created as an opaque ID for either the exposure or visit ID recognized by the butler (appropriately for what that image represents).
      4. astro_metadata_translator.ObservationInfo.detector_exposure_id will be deprecated and ultimately removed.

      In addition, the instrument dimension columns exposure_max and visit_max may be removed in a future schema migration (but this would be accompanied by its own RFC, like any other migration).


        Issue Links



              jbosch Jim Bosch
              jbosch Jim Bosch
              Clare Saunders, Jim Bosch, John Parejko, Kian-Tat Lim, Meredith Rawls, Tim Jenness, Yusra AlSayyad
              0 Vote for this issue
              7 Start watching this issue


                Planned End:


                  No builds found.