Uploaded image for project: 'Request For Comments'
  1. Request For Comments
  2. RFC-842

Fixing Versioned Cameras

    XMLWordPrintable

    Details

    • Type: RFC
    • Status: Adopted
    • Resolution: Unresolved
    • Component/s: DM
    • Labels:
      None

      Description

      Problem Statement

      The concept of a versioned camera - in which afw.cameraGeom content could be associated with validity ranges, like flats and biases - was one of the long-term feature requests that went into the development of the gen3 middleware. We now support them in the middleware, in that obs packages can write an arbitrary number of camera datasets to a butler data repository and give each its own validity range.

      But at present, all obs packages write a single camera with an infinite validity range, because the way we support versioned cameras has two major flaws:

      First, we have a few PipelineTasks that assume the existence of a single, unversioned camera - or more precisely, they attempt to read a camera from a data repository without saying when it needs to be valid. Sometimes this is completely justified, because the task just needs access to static content, like the names and IDs of detectors and amplifiers. In other cases it is a pragmatic choice when the relevant aspect of the camera changes little or not at all, and the code needs to process many observations - it is a lot simpler and faster to load one camera for all of them. Currently, these PipelineTasks need to use a rather messy set of workarounds, involving "unbounded" collections and custom connection lookupFunctions, and it will get even messier if an obs package actually does start using versioned cameras.

      Second, by versioning the combination of all of the things that go into a camera, we make it much harder to organize the source of truth and write tasks that constrain just one part of a camera. We've thus far avoided this problem by having these tasks write separate IsrCalib datasets and giving IsrTask the ability to read things like gains, linearizers, and crosstalk from either Camera objects or IsrCalibs. This might be tolerable if IsrTask was the only consumer of these calibrations, but it isn't, and this will only get worse as commissioning proceeds.

      Proposal

      To address these problems, Robert Lupton and I are proposing that:

      • obs packages be responsible only for defining a nominal, static camera, and writing this to the butler data repository (i.e. we will formalize what they do now as what they should continue to do);
      • the current "camera" butler dataset type will be used to refer to this static, nominal camera, and we will stop associating it with a validity range;
      • IsrCalib datasets in butler data repositories continue to be the way that calibrations that may be included in a camera are produced and persisted (again, do what we are already doing);
      • we provide factory functions for cameras and detectors that load IsrCalib datasets and nominal cameras from butler data repositories and update the camera/detector with IsrCalib values for a particular epoch. These will be the preferred way for all code (including IsrTask) to obtain values that are present in both cameras and IsrCalibs. In some cases, where accessing a value from a nominal camera only is more likely to be a mistake than a reasonable use of a mostly-static quantity, the obs-package value may be set to NaN or None, forcing all users of those values to go through the versioned-camera factories (and provide the temporal information necessary to do so).

      I've prototyped these factory functions and the interface used by IsrCalibs to implement them on this PR.
      There's a fair amount of sophistication to allow the IsrCalib updates to be customized in various ways, but typical usage should be very simple, e.g.:

      detector = my_instrument.load_detector(11, butler, exposure=500)
      

      or

      camera = my_instrument.load_camera(
          butler,
          day_obs=20250129,
          seq_num=1261,
          ptc=False,
      )
      

      Potential concerns and mitigations

      Subclassing IsrCalib in Python allows for much more freedom in parameterizing aspects of a camera than the afw.cameraGeom classes, which are implemented in C++ and must be able to serialize their state in a backwards-compatible way. If we are using or plan to use parameterizations that are not already in cameraGeom (I suspect we are, but hope Christopher Waters can confirm), implementing this RFC will be a lot harder. Instead of modifying the C++ cameraGeom code, it may be better to remove these attributes from the C++ classes and FITS serialization entirely, replacing them with more general pybind11-layer machinery for attaching pure-Python objects to the Camera, Detector, and Amplifier classes. These would not be accessible to C++ code, but we probably want them to round-trip through C++ calls (e.g. attaching a Detector to an Exposure).

      While most use of versioned cameras will not be in performance-critical contexts, IsrTask absolutely is, and always passing camera-relevant IsrCalib data though afw.cameraGeom objects will have some performance impact. This should be negligible, but we should benchmark before fully replacing the current direct use of IsrCalib objects there.

      Finally, it should be acknowledged that this proposal is itself a compromise between its authors on two points that may also occur to others:

      • Jim Bosch is a little bothered by the (minor) interface segregation violation here (i.e. Camera has many things some of its users don't care about, and must depend on nonetheless), but a plethora of small interfaces would conflict with Robert Lupton's desire to make it as simple as possible for non-expert users to find any camera calibration information.
      • Robert Lupton would prefer for the access to a versioned camera go through Butler.get; factory functions are a concession to Jim Bosch to manage internal butler complexity.

      Related (but mostly out-of-scope) issues

      This proposal provides a consistent way to get an appropriate versioned cameraGeom object if a butler data repository with an appropriately set up calibration collection is available. It does not attempt to solve the problem of setting up those data repositories.

      Previous design discussions on this topic have included the notion of obs_*_data git repositories for versioned camera and other calibrations.  These packages could play a role in making it easier to set up data repositories, as a place to keep exported calibrations. But with versioned camera data produced by PipelineTasks as IsrCalibs and written directly to the butler data repository, it does not make sense to inject a git repository into the usual process of loading a camera, and hence it should be a central butler data repository, not these git repositories, that serves as the ultimate source of truth. For human-edited camera data, a git repository of course makes more sense.

        Attachments

          Issue Links

            Activity

            Hide
            tjenness Tim Jenness added a comment -

            As a comment on the final paragraph. It was never the intent to involve a git repo in the retrieval of versioned camera or any other calibrations. The obs data packages are there for calibrations that are created outside of the pipeline processing and are human curated (defects and qe curves etc) and generally human editable (or at least not binary files). The job of the write-curated-calibrations command is to extract all those calibrations from the git repo and write them into a butler repository. We currently have a minor provenance issue in that the write-curated-calibrations command does not store these calibrations in versioned run collections – it should really be using the git version in the run name.

            You could also conceive of some pipeline-generated data products that are small having some form of export from the butler to store them in the obs data package as an easy way to seed an extrernal calibration repository (especially if provenance information could come with it).

            Show
            tjenness Tim Jenness added a comment - As a comment on the final paragraph. It was never the intent to involve a git repo in the retrieval of versioned camera or any other calibrations. The obs data packages are there for calibrations that are created outside of the pipeline processing and are human curated (defects and qe curves etc) and generally human editable (or at least not binary files). The job of the write-curated-calibrations command is to extract all those calibrations from the git repo and write them into a butler repository. We currently have a minor provenance issue in that the write-curated-calibrations command does not store these calibrations in versioned run collections – it should really be using the git version in the run name. You could also conceive of some pipeline-generated data products that are small having some form of export from the butler to store them in the obs data package as an easy way to seed an extrernal calibration repository (especially if provenance information could come with it).
            Hide
            ktl Kian-Tat Lim added a comment -

            Will it be possible for Prompt Processing to find and export/import all datasets needed to produce an appropriately-versioned camera (presumably the latest available)?

            Show
            ktl Kian-Tat Lim added a comment - Will it be possible for Prompt Processing to find and export/import all datasets needed to produce an appropriately-versioned camera (presumably the latest available)?
            Hide
            jbosch Jim Bosch added a comment -

            It was never the intent to involve a git repo in the retrieval of versioned camera or any other calibrations.

            Glad to hear we were already on the same page on this - I suspected we were, but I wasn't sure (and I'm still not sure everyone else who has thought about obs_*_data had the same understanding).

            Will it be possible for Prompt Processing to find and export/import all datasets needed to produce an appropriately-versioned camera (presumably the latest available)?

            Yes; in the prototype on the PR, that's all still delegated fully to butler calls that the Prompt Processing code could run itself. But we should make sure that remains true all the way through to the implementation, and in particular make sure we factor the code such that the lookup can be called separately from the assembly if the lookup logic outside the butler ever becomes nontrivial.

            Show
            jbosch Jim Bosch added a comment - It was never the intent to involve a git repo in the retrieval of versioned camera or any other calibrations. Glad to hear we were already on the same page on this - I suspected we were, but I wasn't sure (and I'm still not sure everyone else who has thought about obs_*_data had the same understanding). Will it be possible for Prompt Processing to find and export/import all datasets needed to produce an appropriately-versioned camera (presumably the latest available)? Yes; in the prototype on the PR, that's all still delegated fully to butler calls that the Prompt Processing code could run itself. But we should make sure that remains true all the way through to the implementation, and in particular make sure we factor the code such that the lookup can be called separately from the assembly if the lookup logic outside the butler ever becomes nontrivial.
            Hide
            krzys Krzysztof Findeisen added a comment - - edited

            Instead of modifying the C++ cameraGeom code, it may be better to remove these attributes from the C++ classes and FITS serialization entirely, replacing them with more general pybind11-layer machinery for attaching pure-Python objects to the Camera, Detector, and Amplifier classes. These would not be accessible to C++ code, but we probably want them to round-trip through C++ calls (e.g. attaching a Detector to an Exposure).

            Based on our experience with ExposureInfo, this will make the Camera ecosystem very brittle. In ExposureInfo, the pybind11 contortions we need to attach Python objects to ExposureInfo while still holding to our memory management policies (i.e., shared pointers everywhere) don't make ExposureInfo itself too ill-behaved, but make it hard to make changes to "attached" classes (e.g., Psf) without breaking something. There's very much a One Right (and non-obvious) Way to work with these classes.

            Are the consequences of imposing this on all the IsrCalib(?) classes worth it?

            Show
            krzys Krzysztof Findeisen added a comment - - edited Instead of modifying the C++ cameraGeom code, it may be better to remove these attributes from the C++ classes and FITS serialization entirely, replacing them with more general pybind11-layer machinery for attaching pure-Python objects to the Camera , Detector , and Amplifier classes. These would not be accessible to C++ code, but we probably want them to round-trip through C++ calls (e.g. attaching a Detector to an Exposure ). Based on our experience with ExposureInfo , this will make the Camera ecosystem very brittle. In ExposureInfo , the pybind11 contortions we need to attach Python objects to ExposureInfo while still holding to our memory management policies (i.e., shared pointers everywhere) don't make ExposureInfo itself too ill-behaved, but make it hard to make changes to "attached" classes (e.g., Psf ) without breaking something. There's very much a One Right (and non-obvious) Way to work with these classes. Are the consequences of imposing this on all the IsrCalib (?) classes worth it?
            Hide
            jbosch Jim Bosch added a comment -

            Based on our experience with ExposureInfo, this will make the Camera ecosystem very brittle.

            Yeah, I'm not fond of the trampoline stuff in Psf, either, though I do still think it's a less-bad option than forcing more code into C++ (to be clear: I think the Psf trampoline work was done about as well as it could be; it's just a problem that has no good solutions). But here we may have better options, because these objects can be completely opaque to C++, and hence using a C++ data structure that holds pybind::object should be much more viable. We may need to have a tiny C++ hierarchy so that the base class doesn't need to link against Python, while the only viable derived class does, but it should still be a lot simpler than a full trampoline.

            But this also reminds me of another problem with our current mix of IsrCalib and Camera usage that this RFC solves (and that I think needs to be solves): if we don't get the IsrCalib values in to the Detector that's persisted with an Exposure, or remove them fully from the persisted Detector, then the Detector objects attached to our Exposures are essentially provenance lies: they claim (or at least strongly) imply that we used some calibrations in ISR that we definitely did not use.

            Show
            jbosch Jim Bosch added a comment - Based on our experience with ExposureInfo, this will make the Camera ecosystem very brittle. Yeah, I'm not fond of the trampoline stuff in Psf , either, though I do still think it's a less-bad option than forcing more code into C++ (to be clear: I think the Psf trampoline work was done about as well as it could be; it's just a problem that has no good solutions). But here we may have better options, because these objects can be completely opaque to C++, and hence using a C++ data structure that holds pybind::object should be much more viable. We may need to have a tiny C++ hierarchy so that the base class doesn't need to link against Python, while the only viable derived class does, but it should still be a lot simpler than a full trampoline. But this also reminds me of another problem with our current mix of IsrCalib and Camera usage that this RFC solves (and that I think needs to be solves): if we don't get the IsrCalib values in to the Detector that's persisted with an Exposure , or remove them fully from the persisted Detector , then the Detector objects attached to our Exposures are essentially provenance lies: they claim (or at least strongly) imply that we used some calibrations in ISR that we definitely did not use.
            Hide
            czw Christopher Waters added a comment - - edited

            The current usage of IsrCalibs has tried to ensure they were compatible with the camera interface.  There is additional information in the stand-alone calib, but so far no algorithm has relied on this extra info.

            {quote}

             This might be tolerable if IsrTask was the only consumer of these calibrations, but it isn't
            {quote}

            What other consumer are we expecting for the calibrations?  Is there a benefit for attaching everything to the camera, instead of supplying the list of the separate calibrations found?  Will all new calibrations (and existing things like brighter-fatter kernels, and the in-progress CTI correction) need to have camera hooks as well?  I'm unclear where the line is drawn.

            I'm also wondering how this relates (possibly only in my mind) to calibration provenance.  I assume something like

            camera = my_instrument.load_camera(
                butler,
                day_obs=20250129,
                seq_num=1261,
                ptc=False,
            )

            produces the camera I would use today to process that exposure.  Is there expected to be something similar if I want to see what calibrations were used to process that exposure a year ago, such as passing my collection name to the butler instance?

            Show
            czw Christopher Waters added a comment - - edited The current usage of IsrCalibs has tried to ensure they were compatible with the camera interface.  There is additional information in the stand-alone calib, but so far no algorithm has relied on this extra info. {quote}  This might be tolerable if IsrTask was the only consumer of these calibrations, but it isn't {quote} What other consumer are we expecting for the calibrations?  Is there a benefit for attaching everything to the camera, instead of supplying the list of the separate calibrations found?  Will all new calibrations (and existing things like brighter-fatter kernels, and the in-progress CTI correction) need to have camera hooks as well?  I'm unclear where the line is drawn. I'm also wondering how this relates (possibly only in my mind) to calibration provenance.  I assume something like camera = my_instrument.load_camera( butler, day_obs= 20250129 , seq_num= 1261 , ptc=False, ) produces the camera I would use today to process that exposure.  Is there expected to be something similar if I want to see what calibrations were used to process that exposure a year ago, such as passing my collection name to the butler instance?
            Hide
            jbosch Jim Bosch added a comment -

            The current usage of IsrCalibs has tried to ensure they were compatible with the camera interface.

            That is a relief. Do you have any guesses as to whether that's likely to change?

            Will all new calibrations (and existing things like brighter-fatter kernels, and the in-progress CTI correction) need to have camera hooks as well? I'm unclear where the line is drawn.

            I think the only clear line I'm drawing is that if the volume of data we'd want to attach to the camera is too big (more than a few KB or so?) we shouldn't attach it to the camera. But I'd feel a lot more comfortable having clearer guidelines for what does and doesn't belong, and for that I'm going to ask Robert Lupton to comment, because I think it's directly related to

            What other consumer [beyond {{IsrTask}}] are we expecting for the calibrations?

            and I definitely need to punt this one to Robert Lupton; in our discussion he's alluded to camera-team work and perhaps simulations, but I do agree that exactly how much we care about non-IsrTask users (and the degree to which they are doing things they should actually be doing through IsrTask) is a key part of the motivation for this proposal.

            Show
            jbosch Jim Bosch added a comment - The current usage of IsrCalibs has tried to ensure they were compatible with the camera interface. That is a relief. Do you have any guesses as to whether that's likely to change? Will all new calibrations (and existing things like brighter-fatter kernels, and the in-progress CTI correction) need to have camera hooks as well? I'm unclear where the line is drawn. I think the only clear line I'm drawing is that if the volume of data we'd want to attach to the camera is too big (more than a few KB or so?) we shouldn't attach it to the camera. But I'd feel a lot more comfortable having clearer guidelines for what does and doesn't belong, and for that I'm going to ask Robert Lupton to comment, because I think it's directly related to What other consumer [beyond {{IsrTask}}] are we expecting for the calibrations? and I definitely need to punt this one to Robert Lupton ; in our discussion he's alluded to camera-team work and perhaps simulations, but I do agree that exactly how much we care about non- IsrTask users (and the degree to which they are doing things they should actually be doing through IsrTask ) is a key part of the motivation for this proposal.
            Hide
            czw Christopher Waters added a comment -

            The first thing that will break camera compatibility is inter-chip crosstalk.  Other than that, new calibrations we don't have implementations for are my next concern. 

            I'm happy to wait for the discussion of use cases and what calibrations are included.

             

            Show
            czw Christopher Waters added a comment - The first thing that will break camera compatibility is inter-chip crosstalk.  Other than that, new calibrations we don't have implementations for are my next concern.  I'm happy to wait for the discussion of use cases and what calibrations are included.  
            Hide
            rhl Robert Lupton added a comment -

            As Tim notes, I am concerned about users of this calibration data who are not using DM code.  After a discussion with Jim Bosch, I think that the way forward is to declare that the butler data being discussed here be made normative, and that (as a separate activity) we define an export format which can be read from "external" python code (e.g. FITS; yaml; ...).  We will have to make sure that the exported data can be traced back to its source in the butler (e.g. by explicit UUIDs, or a label within the butler identifying an immutable collection).

            Show
            rhl Robert Lupton added a comment - As Tim notes, I am concerned about users of this calibration data who are not using DM code.  After a discussion with Jim Bosch , I think that the way forward is to declare that the butler data being discussed here be made normative, and that (as a separate activity) we define an export format which can be read from "external" python code (e.g. FITS; yaml; ...).  We will have to make sure that the exported data can be traced back to its source in the butler (e.g. by explicit UUIDs, or a label within the butler identifying an immutable collection).
            Hide
            jchiang James Chiang added a comment -

            Currently, imSim gets static info about the camera it is simulating via the DM interfaces, i.e., we make a Camera object:

            camera = lsst.obs.lsst.LsstCam().getCamera()
            

            and use that object to get CCD- and amp-specific info as well as other properties of the focal plane.

            It sounds like the changes proposed here would require imSim to use a butler, and therefore a data repo with the camera data persisted in it would be needed. Robert's proposal to have a mechanism to export camera data sounds like a way around this, but it would be very convenient if there were a function that constructed a Camera object from those exported data, e.g.,

            camera = read_exported_camera(<location of exported camera data>)
            

            so that we can continue to use those interfaces.

            Show
            jchiang James Chiang added a comment - Currently, imSim gets static info about the camera it is simulating via the DM interfaces, i.e., we make a Camera object: camera = lsst.obs.lsst.LsstCam().getCamera() and use that object to get CCD- and amp-specific info as well as other properties of the focal plane. It sounds like the changes proposed here would require imSim to use a butler, and therefore a data repo with the camera data persisted in it would be needed. Robert's proposal to have a mechanism to export camera data sounds like a way around this, but it would be very convenient if there were a function that constructed a Camera object from those exported data, e.g., camera = read_exported_camera(<location of exported camera data>) so that we can continue to use those interfaces.
            Hide
            tjenness Tim Jenness added a comment -

            There's a big difference between having a "read_exported_camera" that returns a Camera object that only knows about the calibrations that the Camera object already knows about, and designing a new Camera object that returns every new IsrCalib-type calibration that is created. We would at that point mostly be thinking of Camera as a legacy object exported for convenience of existing users.

            On the other hand, a new Python object that is a bucket of IsrCalib objects (for a specific time) and is all in Python is a lot easier to deal with than requiring that there somehow be a hook in the C++ Camera object for every new IsrCalib calibration.

            Show
            tjenness Tim Jenness added a comment - There's a big difference between having a "read_exported_camera" that returns a Camera object that only knows about the calibrations that the Camera object already knows about, and designing a new Camera object that returns every new IsrCalib-type calibration that is created. We would at that point mostly be thinking of Camera as a legacy object exported for convenience of existing users. On the other hand, a new Python object that is a bucket of IsrCalib objects (for a specific time) and is all in Python is a lot easier to deal with than requiring that there somehow be a hook in the C++ Camera object for every new IsrCalib calibration.
            Hide
            tjenness Tim Jenness added a comment -

            I still don't think the C++ Camera object is the best place for this since it can't make use of all the IsrCalib python objects. Jim Bosch you filed this RFC, would you like to comment as to the current status (or hand it over to Christopher Waters)?

            Show
            tjenness Tim Jenness added a comment - I still don't think the C++ Camera object is the best place for this since it can't make use of all the IsrCalib python objects. Jim Bosch you filed this RFC, would you like to comment as to the current status (or hand it over to Christopher Waters )?
            Hide
            jbosch Jim Bosch added a comment -

            Here's a modified proposal:

            • Use the C++ Camera class to store only the nominal/static content, and deprecate most fields that that have IsrCalib alternatives (there may be some where having an approximate nominal/static value is better than having no value, even if that nominal/static value should not be used in production).
            • Add pure-Python classes that wrap the C++ Camera, Detector, and Amplifier classes, with duck-typed interfaces whenever possible. These could also hold IsrCalib objects and provide access to them at the appropriate level.
            • New butler-backed factories would return instances of those new Python classes.
            • The new Python classes would not be persistable as part of an Exposure. I suspect it'd be a mistake to make them persistable as monolithic files, but it may make sense to make them persistable to/from a directory of files (ideally equivalent to a Butler.retrieve_artifacts call on their constituent datasets).
            Show
            jbosch Jim Bosch added a comment - Here's a modified proposal: Use the C++ Camera class to store only the nominal/static content, and deprecate most fields that that have IsrCalib alternatives (there may be some where having an approximate nominal/static value is better than having no value, even if that nominal/static value should not be used in production). Add pure-Python classes that wrap the C++ Camera, Detector, and Amplifier classes, with duck-typed interfaces whenever possible. These could also hold IsrCalib objects and provide access to them at the appropriate level. New butler-backed factories would return instances of those new Python classes. The new Python classes would not be persistable as part of an Exposure. I suspect it'd be a mistake to make them persistable as monolithic files, but it may make sense to make them persistable to/from a directory of files (ideally equivalent to a Butler.retrieve_artifacts call on their constituent datasets).
            Hide
            tjenness Tim Jenness added a comment -

            Jim Bosch I like this proposal. We should consider storing the UUIDs of the calibration datasets in the persistence files as provenance entries so that in theory you could retrieve them later.

            Show
            tjenness Tim Jenness added a comment - Jim Bosch I like this proposal. We should consider storing the UUIDs of the calibration datasets in the persistence files as provenance entries so that in theory you could retrieve them later.
            Hide
            jbosch Jim Bosch added a comment -

            I've repurposed the prototyping ticket linked previously as the implementation ticket, since it wasn't going to be merged as-is anyway, and adopted.

            Actually scheduling the work is another story, however; I'm not at all sure how to prioritize it (or even who I'd ask to do it, other than myself and perhaps Christopher Waters).

            Show
            jbosch Jim Bosch added a comment - I've repurposed the prototyping ticket linked previously as the implementation ticket, since it wasn't going to be merged as-is anyway, and adopted. Actually scheduling the work is another story, however; I'm not at all sure how to prioritize it (or even who I'd ask to do it, other than myself and perhaps Christopher Waters ).
            Hide
            czw Christopher Waters added a comment -

            I'm happy to work on this, but I have it ranked as "least urgent" in my head.

            Show
            czw Christopher Waters added a comment - I'm happy to work on this, but I have it ranked as "least urgent" in my head.

              People

              Assignee:
              jbosch Jim Bosch
              Reporter:
              jbosch Jim Bosch
              Watchers:
              Christopher Waters, James Chiang, Jim Bosch, John Parejko, Kian-Tat Lim, Robert Lupton, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Dates

                Created:
                Updated:
                Planned End:

                  Jenkins

                  No builds found.