The concept of a versioned camera - in which afw.cameraGeom content could be associated with validity ranges, like flats and biases - was one of the long-term feature requests that went into the development of the gen3 middleware. We now support them in the middleware, in that obs packages can write an arbitrary number of camera datasets to a butler data repository and give each its own validity range.
But at present, all obs packages write a single camera with an infinite validity range, because the way we support versioned cameras has two major flaws:
First, we have a few PipelineTasks that assume the existence of a single, unversioned camera - or more precisely, they attempt to read a camera from a data repository without saying when it needs to be valid. Sometimes this is completely justified, because the task just needs access to static content, like the names and IDs of detectors and amplifiers. In other cases it is a pragmatic choice when the relevant aspect of the camera changes little or not at all, and the code needs to process many observations - it is a lot simpler and faster to load one camera for all of them. Currently, these PipelineTasks need to use a rather messy set of workarounds, involving "unbounded" collections and custom connection lookupFunctions, and it will get even messier if an obs package actually does start using versioned cameras.
Second, by versioning the combination of all of the things that go into a camera, we make it much harder to organize the source of truth and write tasks that constrain just one part of a camera. We've thus far avoided this problem by having these tasks write separate IsrCalib datasets and giving IsrTask the ability to read things like gains, linearizers, and crosstalk from either Camera objects or IsrCalibs. This might be tolerable if IsrTask was the only consumer of these calibrations, but it isn't, and this will only get worse as commissioning proceeds.
To address these problems, Robert Lupton and I are proposing that:
- obs packages be responsible only for defining a nominal, static camera, and writing this to the butler data repository (i.e. we will formalize what they do now as what they should continue to do);
- the current "camera" butler dataset type will be used to refer to this static, nominal camera, and we will stop associating it with a validity range;
- IsrCalib datasets in butler data repositories continue to be the way that calibrations that may be included in a camera are produced and persisted (again, do what we are already doing);
- we provide factory functions for cameras and detectors that load IsrCalib datasets and nominal cameras from butler data repositories and update the camera/detector with IsrCalib values for a particular epoch. These will be the preferred way for all code (including IsrTask) to obtain values that are present in both cameras and IsrCalibs. In some cases, where accessing a value from a nominal camera only is more likely to be a mistake than a reasonable use of a mostly-static quantity, the obs-package value may be set to NaN or None, forcing all users of those values to go through the versioned-camera factories (and provide the temporal information necessary to do so).
I've prototyped these factory functions and the interface used by IsrCalibs to implement them on this PR.
There's a fair amount of sophistication to allow the IsrCalib updates to be customized in various ways, but typical usage should be very simple, e.g.:
Subclassing IsrCalib in Python allows for much more freedom in parameterizing aspects of a camera than the afw.cameraGeom classes, which are implemented in C++ and must be able to serialize their state in a backwards-compatible way. If we are using or plan to use parameterizations that are not already in cameraGeom (I suspect we are, but hope Christopher Waters can confirm), implementing this RFC will be a lot harder. Instead of modifying the C++ cameraGeom code, it may be better to remove these attributes from the C++ classes and FITS serialization entirely, replacing them with more general pybind11-layer machinery for attaching pure-Python objects to the Camera, Detector, and Amplifier classes. These would not be accessible to C++ code, but we probably want them to round-trip through C++ calls (e.g. attaching a Detector to an Exposure).
While most use of versioned cameras will not be in performance-critical contexts, IsrTask absolutely is, and always passing camera-relevant IsrCalib data though afw.cameraGeom objects will have some performance impact. This should be negligible, but we should benchmark before fully replacing the current direct use of IsrCalib objects there.
Finally, it should be acknowledged that this proposal is itself a compromise between its authors on two points that may also occur to others:
- Jim Bosch is a little bothered by the (minor) interface segregation violation here (i.e. Camera has many things some of its users don't care about, and must depend on nonetheless), but a plethora of small interfaces would conflict with Robert Lupton's desire to make it as simple as possible for non-expert users to find any camera calibration information.
- Robert Lupton would prefer for the access to a versioned camera go through Butler.get; factory functions are a concession to Jim Bosch to manage internal butler complexity.
This proposal provides a consistent way to get an appropriate versioned cameraGeom object if a butler data repository with an appropriately set up calibration collection is available. It does not attempt to solve the problem of setting up those data repositories.
Previous design discussions on this topic have included the notion of obs_*_data git repositories for versioned camera and other calibrations. These packages could play a role in making it easier to set up data repositories, as a place to keep exported calibrations. But with versioned camera data produced by PipelineTasks as IsrCalibs and written directly to the butler data repository, it does not make sense to inject a git repository into the usual process of loading a camera, and hence it should be a central butler data repository, not these git repositories, that serves as the ultimate source of truth. For human-edited camera data, a git repository of course makes more sense.