Uploaded image for project: 'Request For Comments'
  1. Request For Comments
  2. RFC-842

Fixing Versioned Cameras

    XMLWordPrintable

    Details

    • Type: RFC
    • Status: Adopted
    • Resolution: Unresolved
    • Component/s: DM
    • Labels:
      None

      Description

      Problem Statement

      The concept of a versioned camera - in which afw.cameraGeom content could be associated with validity ranges, like flats and biases - was one of the long-term feature requests that went into the development of the gen3 middleware. We now support them in the middleware, in that obs packages can write an arbitrary number of camera datasets to a butler data repository and give each its own validity range.

      But at present, all obs packages write a single camera with an infinite validity range, because the way we support versioned cameras has two major flaws:

      First, we have a few PipelineTasks that assume the existence of a single, unversioned camera - or more precisely, they attempt to read a camera from a data repository without saying when it needs to be valid. Sometimes this is completely justified, because the task just needs access to static content, like the names and IDs of detectors and amplifiers. In other cases it is a pragmatic choice when the relevant aspect of the camera changes little or not at all, and the code needs to process many observations - it is a lot simpler and faster to load one camera for all of them. Currently, these PipelineTasks need to use a rather messy set of workarounds, involving "unbounded" collections and custom connection lookupFunctions, and it will get even messier if an obs package actually does start using versioned cameras.

      Second, by versioning the combination of all of the things that go into a camera, we make it much harder to organize the source of truth and write tasks that constrain just one part of a camera. We've thus far avoided this problem by having these tasks write separate IsrCalib datasets and giving IsrTask the ability to read things like gains, linearizers, and crosstalk from either Camera objects or IsrCalibs. This might be tolerable if IsrTask was the only consumer of these calibrations, but it isn't, and this will only get worse as commissioning proceeds.

      Proposal

      To address these problems, Robert Lupton and I are proposing that:

      • obs packages be responsible only for defining a nominal, static camera, and writing this to the butler data repository (i.e. we will formalize what they do now as what they should continue to do);
      • the current "camera" butler dataset type will be used to refer to this static, nominal camera, and we will stop associating it with a validity range;
      • IsrCalib datasets in butler data repositories continue to be the way that calibrations that may be included in a camera are produced and persisted (again, do what we are already doing);
      • we provide factory functions for cameras and detectors that load IsrCalib datasets and nominal cameras from butler data repositories and update the camera/detector with IsrCalib values for a particular epoch. These will be the preferred way for all code (including IsrTask) to obtain values that are present in both cameras and IsrCalibs. In some cases, where accessing a value from a nominal camera only is more likely to be a mistake than a reasonable use of a mostly-static quantity, the obs-package value may be set to NaN or None, forcing all users of those values to go through the versioned-camera factories (and provide the temporal information necessary to do so).

      I've prototyped these factory functions and the interface used by IsrCalibs to implement them on this PR.
      There's a fair amount of sophistication to allow the IsrCalib updates to be customized in various ways, but typical usage should be very simple, e.g.:

      detector = my_instrument.load_detector(11, butler, exposure=500)
      

      or

      camera = my_instrument.load_camera(
          butler,
          day_obs=20250129,
          seq_num=1261,
          ptc=False,
      )
      

      Potential concerns and mitigations

      Subclassing IsrCalib in Python allows for much more freedom in parameterizing aspects of a camera than the afw.cameraGeom classes, which are implemented in C++ and must be able to serialize their state in a backwards-compatible way. If we are using or plan to use parameterizations that are not already in cameraGeom (I suspect we are, but hope Christopher Waters can confirm), implementing this RFC will be a lot harder. Instead of modifying the C++ cameraGeom code, it may be better to remove these attributes from the C++ classes and FITS serialization entirely, replacing them with more general pybind11-layer machinery for attaching pure-Python objects to the Camera, Detector, and Amplifier classes. These would not be accessible to C++ code, but we probably want them to round-trip through C++ calls (e.g. attaching a Detector to an Exposure).

      While most use of versioned cameras will not be in performance-critical contexts, IsrTask absolutely is, and always passing camera-relevant IsrCalib data though afw.cameraGeom objects will have some performance impact. This should be negligible, but we should benchmark before fully replacing the current direct use of IsrCalib objects there.

      Finally, it should be acknowledged that this proposal is itself a compromise between its authors on two points that may also occur to others:

      • Jim Bosch is a little bothered by the (minor) interface segregation violation here (i.e. Camera has many things some of its users don't care about, and must depend on nonetheless), but a plethora of small interfaces would conflict with Robert Lupton's desire to make it as simple as possible for non-expert users to find any camera calibration information.
      • Robert Lupton would prefer for the access to a versioned camera go through Butler.get; factory functions are a concession to Jim Bosch to manage internal butler complexity.

      Related (but mostly out-of-scope) issues

      This proposal provides a consistent way to get an appropriate versioned cameraGeom object if a butler data repository with an appropriately set up calibration collection is available. It does not attempt to solve the problem of setting up those data repositories.

      Previous design discussions on this topic have included the notion of obs_*_data git repositories for versioned camera and other calibrations.  These packages could play a role in making it easier to set up data repositories, as a place to keep exported calibrations. But with versioned camera data produced by PipelineTasks as IsrCalibs and written directly to the butler data repository, it does not make sense to inject a git repository into the usual process of loading a camera, and hence it should be a central butler data repository, not these git repositories, that serves as the ultimate source of truth. For human-edited camera data, a git repository of course makes more sense.

        Attachments

          Issue Links

            Activity

            Hide
            tjenness Tim Jenness added a comment -

            I still don't think the C++ Camera object is the best place for this since it can't make use of all the IsrCalib python objects. Jim Bosch you filed this RFC, would you like to comment as to the current status (or hand it over to Christopher Waters)?

            Show
            tjenness Tim Jenness added a comment - I still don't think the C++ Camera object is the best place for this since it can't make use of all the IsrCalib python objects. Jim Bosch you filed this RFC, would you like to comment as to the current status (or hand it over to Christopher Waters )?
            Hide
            jbosch Jim Bosch added a comment -

            Here's a modified proposal:

            • Use the C++ Camera class to store only the nominal/static content, and deprecate most fields that that have IsrCalib alternatives (there may be some where having an approximate nominal/static value is better than having no value, even if that nominal/static value should not be used in production).
            • Add pure-Python classes that wrap the C++ Camera, Detector, and Amplifier classes, with duck-typed interfaces whenever possible. These could also hold IsrCalib objects and provide access to them at the appropriate level.
            • New butler-backed factories would return instances of those new Python classes.
            • The new Python classes would not be persistable as part of an Exposure. I suspect it'd be a mistake to make them persistable as monolithic files, but it may make sense to make them persistable to/from a directory of files (ideally equivalent to a Butler.retrieve_artifacts call on their constituent datasets).
            Show
            jbosch Jim Bosch added a comment - Here's a modified proposal: Use the C++ Camera class to store only the nominal/static content, and deprecate most fields that that have IsrCalib alternatives (there may be some where having an approximate nominal/static value is better than having no value, even if that nominal/static value should not be used in production). Add pure-Python classes that wrap the C++ Camera, Detector, and Amplifier classes, with duck-typed interfaces whenever possible. These could also hold IsrCalib objects and provide access to them at the appropriate level. New butler-backed factories would return instances of those new Python classes. The new Python classes would not be persistable as part of an Exposure. I suspect it'd be a mistake to make them persistable as monolithic files, but it may make sense to make them persistable to/from a directory of files (ideally equivalent to a Butler.retrieve_artifacts call on their constituent datasets).
            Hide
            tjenness Tim Jenness added a comment -

            Jim Bosch I like this proposal. We should consider storing the UUIDs of the calibration datasets in the persistence files as provenance entries so that in theory you could retrieve them later.

            Show
            tjenness Tim Jenness added a comment - Jim Bosch I like this proposal. We should consider storing the UUIDs of the calibration datasets in the persistence files as provenance entries so that in theory you could retrieve them later.
            Hide
            jbosch Jim Bosch added a comment -

            I've repurposed the prototyping ticket linked previously as the implementation ticket, since it wasn't going to be merged as-is anyway, and adopted.

            Actually scheduling the work is another story, however; I'm not at all sure how to prioritize it (or even who I'd ask to do it, other than myself and perhaps Christopher Waters).

            Show
            jbosch Jim Bosch added a comment - I've repurposed the prototyping ticket linked previously as the implementation ticket, since it wasn't going to be merged as-is anyway, and adopted. Actually scheduling the work is another story, however; I'm not at all sure how to prioritize it (or even who I'd ask to do it, other than myself and perhaps Christopher Waters ).
            Hide
            czw Christopher Waters added a comment -

            I'm happy to work on this, but I have it ranked as "least urgent" in my head.

            Show
            czw Christopher Waters added a comment - I'm happy to work on this, but I have it ranked as "least urgent" in my head.

              People

              Assignee:
              jbosch Jim Bosch
              Reporter:
              jbosch Jim Bosch
              Watchers:
              Christopher Waters, James Chiang, Jim Bosch, John Parejko, Kian-Tat Lim, Robert Lupton, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Dates

                Created:
                Updated:
                Planned End:

                  Jenkins

                  No builds found.