Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-29370

Allow butler gen3 to retrieve amplifiers from raw

    XMLWordPrintable

    Details

    • Story Points:
      8
    • Team:
      Data Release Production
    • Urgent?:
      No

      Description

      Eric Charles is doing EO testing and would like to be able to access the unadulterated raw pixel data for each amplifier. Currently he has to retrieve the raw with bounding boxes and has to revert flipping. This can be inefficient for a single amp.

      The request is to add a raw.amp derived component that can return the amp pixel data and header exactly as it is stored in the FITS file.

      Additional questions:

      • Should there be a raw.amps to return all the amps in a dict/list?
      • How does the amp number get passed to butler.get?
      • Should bounding boxes be allowed to request a subset of an amplifier?

      There might need to be some changes to daf_butler to support parameters for the derived component since currently all parameters are handled by the composite before the result is given to the derived component (so passing in bbox refers to a subset of the entire image before the derived component sees it).

        Attachments

          Issue Links

            Activity

            Hide
            tjenness Tim Jenness added a comment -

            Given that the amplifier has the same python type as the raw itself, it seems like a better answer is simply to add the amp as an integer parameter. A derived component makes sense if you are returning a different python type so raw.amps would have to be a derived component (since it returns a dict of Exposure) but selecting an amp could be all done with the standard parameter system.

            Show
            tjenness Tim Jenness added a comment - Given that the amplifier has the same python type as the raw itself, it seems like a better answer is simply to add the amp as an integer parameter. A derived component makes sense if you are returning a different python type so raw.amps would have to be a derived component (since it returns a dict of Exposure) but selecting an amp could be all done with the standard parameter system.
            Hide
            echarles Eric Charles added a comment - - edited

            Some considerations from the user side:

            • Memory usage and access in pipeline tasks.  Some obvious use cases are for stacking and stability studies, there we want to avoid having to read the whole CCD to get the data for a particular AMP, since we will be stacking / analyzing the data per amp.   So, I'm wondering:
              • How to get the list of available amps for a given ccd without reading any data?
              • How does this work with data references or DeferredDatasetHandles.   I.e., does it work doing either butler.get("raw.amp" ...) or handle.get().    For stacking and stability studies we will want to loop over the amps first, and the exposure will be the inner loop.
            • Coordinates / bounding boxes of the returned object.   In particular, how do we access particular sub-regions of the returned object?  I.e., what exactly do we get if the ask the returned object for getDetector(), is it just the amplifier info for the associated amp?   How are the flips handled.   (I think these questions are pretty much the same as the ones in the description, albeit asking specifically if exposure.getDetector() will still be the way to access sub-region information, and if there are any specific changes that are needed get the correct information.
            • Efficiency.   Assemble the CCD took a sub-dominant but noticable amount of time in the bias studies I did in gen2.   If we are going to de-assemble the data to the amplifier level anyway, then avoiding the assemble time would be preferable.  
              • This time is sub-dominant, so this is not critical, but it would be preferable.
            • Writeability.  For doing bias studies would be nice to be able to write back data in the same format.  This pretty secondary, but it could be useful.   E.g., if we are testing bias subtraction algorithms it would be very useful to work entirely in per-amp space, and writing various types of stacked images in the original format.

             

             

            Show
            echarles Eric Charles added a comment - - edited Some considerations from the user side: Memory usage and access in pipeline tasks.  Some obvious use cases are for stacking and stability studies, there we want to avoid having to read the whole CCD to get the data for a particular AMP, since we will be stacking / analyzing the data per amp.   So, I'm wondering: How to get the list of available amps for a given ccd without reading any data? How does this work with data references or DeferredDatasetHandles.   I.e., does it work doing either butler.get("raw.amp" ...) or handle.get().    For stacking and stability studies we will want to loop over the amps first, and the exposure will be the inner loop. Coordinates / bounding boxes of the returned object.   In particular, how do we access particular sub-regions of the returned object?  I.e., what exactly do we get if the ask the returned object for getDetector(), is it just the amplifier info for the associated amp?   How are the flips handled.   (I think these questions are pretty much the same as the ones in the description, albeit asking specifically if exposure.getDetector() will still be the way to access sub-region information, and if there are any specific changes that are needed get the correct information. Efficiency.   Assemble the CCD took a sub-dominant but noticable amount of time in the bias studies I did in gen2.   If we are going to de-assemble the data to the amplifier level anyway, then avoiding the assemble time would be preferable.   This time is sub-dominant, so this is not critical, but it would be preferable. Writeability.  For doing bias studies would be nice to be able to write back data in the same format.  This pretty secondary, but it could be useful.   E.g., if we are testing bias subtraction algorithms it would be very useful to work entirely in per-amp space, and writing various types of stacked images in the original format.    
            Hide
            tjenness Tim Jenness added a comment -

            If we are returning an Exposure of a single amp then surely it wouldn't have a Detector attached to it at all. I have no idea if an Exposure can be associated with a specific amp definition.

            Show
            tjenness Tim Jenness added a comment - If we are returning an Exposure of a single amp then surely it wouldn't have a Detector attached to it at all. I have no idea if an Exposure can be associated with a specific amp definition.
            Hide
            echarles Eric Charles added a comment -

            How will I know determine what the bounding boxes for the various sub-regions are?  

             

            Show
            echarles Eric Charles added a comment - How will I know determine what the bounding boxes for the various sub-regions are?    
            Hide
            tjenness Tim Jenness added a comment -

            That's for Robert Lupton – I assume he solves this somehow in gen2. I don't know what it means for amplifiers to have sub-regions.

            Show
            tjenness Tim Jenness added a comment - That's for Robert Lupton – I assume he solves this somehow in gen2. I don't know what it means for amplifiers to have sub-regions.
            Hide
            echarles Eric Charles added a comment -

            Sub-regions as in the imaging region and the serial and parallel overscan regions.

            Show
            echarles Eric Charles added a comment - Sub-regions as in the imaging region and the serial and parallel overscan regions.
            Hide
            jbosch Jim Bosch added a comment - - edited

            I think we definitely need to attach a Detector of some kind to the Exposure this method returns, or it won't be of much use, and it seems like constructing that out of in-untrimmed-assembled-coordinates Detector objects that come directly from the cameraGeom will be the vast majority of the work on this ticket.

            Circa DM-18610 I had big plans for making cameraGeom more amenable to that sort of thing, which got handed off to Christopher Waters because Gen3 was (still) calling, and then it promptly got buried beneath a pile of CPP and ISR issues for Chris. As it stands, the cameraGeom design is supposed to only ever represent the bounding boxes in a particular, standard state (post-assembly for detector-level boxes, assembled+untrimmed for amplifier boxes), because all other boxes for other states can be derived from that, and it's supposed to be true that if you do butler.get("camera", ...)[N] or butler.get("raw", detector=N, ...).getDetector() or butler.get("calexp", detector=N, ...).getDetector(), you get exactly the same thing each time, because the cameraGeom isn't supposed to know the assembly state and certainly isn't supposed to have bounding boxes that depend on it.

            All that said, I believe there is long-standing code in various places (including the obs_lsst raw formatters, IIRC) that basically ignores that and does treat the Detector and Amplifiers as things whose bounding boxes should be modified as the Exposure they are attached to is, and, well, it mostly works anyway - if you don't read the afw docstrings about what various bboxes are supposed to mean too carefully, and are willing to ignore the various attributes that don't make sense at all for some states, and don't actually expect those statements about butler.get consistency. It drives me absolutely bonkers, but the people who work with it regularly seem to be doing okay, and I just don't have time (and I don't think Chris has time) to really fix it.

            So, I am not optimistic about us solving that well anytime soon - at some level I think it's a symptom of us just declaring every image an Exposure instead of having different classes to represent different kinds of images and the states they can be in - and while whatever we do on this ticket needs to involve providing useful cameraGeom somehow, we need to be prepared to hold our noses about that making sense as part of a bigger picture if we want it to get done.

            Show
            jbosch Jim Bosch added a comment - - edited I think we definitely need to attach a Detector of some kind to the Exposure this method returns, or it won't be of much use, and it seems like constructing that out of in-untrimmed-assembled-coordinates Detector objects that come directly from the cameraGeom will be the vast majority of the work on this ticket. Circa DM-18610 I had big plans for making cameraGeom more amenable to that sort of thing, which got handed off to Christopher Waters because Gen3 was (still) calling, and then it promptly got buried beneath a pile of CPP and ISR issues for Chris. As it stands, the cameraGeom design is supposed to only ever represent the bounding boxes in a particular, standard state (post-assembly for detector-level boxes, assembled+untrimmed for amplifier boxes), because all other boxes for other states can be derived from that, and it's supposed to be true that if you do butler.get("camera", ...) [N] or butler.get("raw", detector=N, ...).getDetector() or butler.get("calexp", detector=N, ...).getDetector() , you get exactly the same thing each time, because the cameraGeom isn't supposed to know the assembly state and certainly isn't supposed to have bounding boxes that depend on it. All that said, I believe there is long-standing code in various places (including the obs_lsst raw formatters, IIRC) that basically ignores that and does treat the Detector and Amplifiers as things whose bounding boxes should be modified as the Exposure they are attached to is, and, well, it mostly works anyway - if you don't read the afw docstrings about what various bboxes are supposed to mean too carefully, and are willing to ignore the various attributes that don't make sense at all for some states, and don't actually expect those statements about butler.get consistency. It drives me absolutely bonkers, but the people who work with it regularly seem to be doing okay, and I just don't have time (and I don't think Chris has time) to really fix it. So, I am not optimistic about us solving that well anytime soon - at some level I think it's a symptom of us just declaring every image an Exposure instead of having different classes to represent different kinds of images and the states they can be in - and while whatever we do on this ticket needs to involve providing useful cameraGeom somehow, we need to be prepared to hold our noses about that making sense as part of a bigger picture if we want it to get done.
            Hide
            jbosch Jim Bosch added a comment -

            I think this ticket is no longer blocked by DM-29505 prototyping (which I've indicated by closing that ticket). That was a useful thought experiment and it may come back someday, but for now we've agreed that the best way to represent the camera geometry of the single-amplifier image returned by the functionality requested by this ticket is just to attach an afw.image.Detector object that contains a single amplifier in the appropriate state.

            For LSST data, that should be straightforward because the Amplifier objects start out in a state that matches the actual raw data (not the "raw" that butler returns), and are transformed in the raw formatter code. So we just need to intercept those Amplifier objects early and return them before that transformation. For other instruments that have preassembled/untrimmed raws, we can also just return the Amplifier objects we start with, because those don't ever change. In both cases we'll be asking the Detector to "rebuild" itself with just one Amplifier.

            I think that's enough information for either Tim Jenness or myself to do the work, though I'm not sure how much Tim knows about those CameraGeom APIs, so it might make more sense for me to do it.

            I think we might want a new StorageClass or two that also maps to afw.image.Exposure, for raw and raw.amp, to enable this, though trying to redefine the raw dataset type in a bunch of live repos sounds unpleasant in a character-building, "I guess this is something we need to be able to do" kind of way. It'd be easier to just add an amp read-component to the Exposure StorageClass and just expect that most formatters won't implement it.

            We will also need a new ticket to enable reading single amplifiers from at least bias images; that will be trickier (we need to essentially write the inverse of the LSST raw formatter assembly code), and Robert Lupton has offered to write the guts of it. I imagine that means biases and perhaps some other master calibs would also use the new StorageClass, if we have one.

            Show
            jbosch Jim Bosch added a comment - I think this ticket is no longer blocked by DM-29505 prototyping (which I've indicated by closing that ticket). That was a useful thought experiment and it may come back someday, but for now we've agreed that the best way to represent the camera geometry of the single-amplifier image returned by the functionality requested by this ticket is just to attach an afw.image.Detector object that contains a single amplifier in the appropriate state. For LSST data, that should be straightforward because the Amplifier objects start out in a state that matches the actual raw data (not the "raw" that butler returns), and are transformed in the raw formatter code. So we just need to intercept those Amplifier objects early and return them before that transformation. For other instruments that have preassembled/untrimmed raws, we can also just return the Amplifier objects we start with, because those don't ever change. In both cases we'll be asking the Detector to "rebuild" itself with just one Amplifier . I think that's enough information for either Tim Jenness or myself to do the work, though I'm not sure how much Tim knows about those CameraGeom APIs, so it might make more sense for me to do it. I think we might want a new StorageClass or two that also maps to afw.image.Exposure , for raw and raw.amp , to enable this, though trying to redefine the raw dataset type in a bunch of live repos sounds unpleasant in a character-building, "I guess this is something we need to be able to do" kind of way. It'd be easier to just add an amp read-component to the Exposure StorageClass and just expect that most formatters won't implement it. We will also need a new ticket to enable reading single amplifiers from at least bias images; that will be trickier (we need to essentially write the inverse of the LSST raw formatter assembly code), and Robert Lupton has offered to write the guts of it. I imagine that means biases and perhaps some other master calibs would also use the new StorageClass, if we have one.
            Hide
            jbosch Jim Bosch added a comment -

            This took much longer than expected because I kept finding new little things that needed fixing along the way, and testing this kind of geometric bookkeeping is always a pain, but I think this is finally ready for review.

            It also includes everything that I'd originally planned to defer to DM-29732, because it turned out that was essentially the same as making it work for non-LSST raws, and I wanted the logic to test the LSST raw case, too.

            Robert Lupton, I'm hoping you can review; this'll be your chance to move out of last place on the Science Pipelines reviewing leaderboard , but it's a shame you won't get credit there for a larger-than-average ticket.

            Changes are split across five packages (as usual, Jira doesn't see the afw one), and I've tried to define the commits to make reviewing commit-by-commit more digestible than looking at the full changes at once.  The PRs are (in the order I'd recommend looking at them):

            • utils: adds some flexibility to NaN comparisons in our workhorse array-comparison test assertion.
            • daf_butler: just some YAML declarations for the new read parameters, but that includes the detailed documentation (as YAML comments, sadly; DM-29985) for how to use the new parameters and what exactly one can expect to get back.
            • afw: most of the new geometric logic and a long-overdue redocumentation of amp attributes to reflect how they are actually used.
            • obs_base: support for per-amp reads in the base formatter class for raws (this implementation is inherited directly by all but obs_lsst) and the concrete formatter for non-raw Exposures.  Also support for Butler.get("raw.detector").
            • obs_lsst: support for per-amp reads in the raw formatter class, because the base class implementation won't work given its unique (among DM-maintained obs packages) on-disk layout.  Also fixes to make Butler.get("raw.detector") return the right thing.

             

            Show
            jbosch Jim Bosch added a comment - This took much longer than expected because I kept finding new little things that needed fixing along the way, and testing this kind of geometric bookkeeping is always a pain, but I think this is finally ready for review. It also includes everything that I'd originally planned to defer to DM-29732 , because it turned out that was essentially the same as making it work for non-LSST raws, and I wanted the logic to test the LSST raw case, too. Robert Lupton , I'm hoping you can review; this'll be your chance to move out of last place on the Science Pipelines reviewing leaderboard , but it's a shame you won't get credit there for a larger-than-average ticket. Changes are split across five packages (as usual, Jira doesn't see the afw one), and I've tried to define the commits to make reviewing commit-by-commit more digestible than looking at the full changes at once.  The PRs are (in the order I'd recommend looking at them): utils : adds some flexibility to NaN comparisons in our workhorse array-comparison test assertion. daf_butler : just some YAML declarations for the new read parameters, but that includes the detailed documentation (as YAML comments, sadly; DM-29985 ) for how to use the new parameters and what exactly one can expect to get back. afw : most of the new geometric logic and a long-overdue redocumentation of amp attributes to reflect how they are actually used. obs_base : support for per-amp reads in the base formatter class for raws (this implementation is inherited directly by all but obs_lsst) and the concrete formatter for non-raw Exposures.  Also support for Butler.get("raw.detector"). obs_lsst : support for per-amp reads in the raw formatter class, because the base class implementation won't work given its unique (among DM-maintained obs packages) on-disk layout.  Also fixes to make Butler.get("raw.detector") return the right thing.  
            Hide
            jbosch Jim Bosch added a comment -

            Christopher Waters, here's that review-transfer from Robert Lupton I mentioned.

            Show
            jbosch Jim Bosch added a comment - Christopher Waters , here's that review-transfer from Robert Lupton I mentioned.
            Hide
            czw Christopher Waters added a comment -

            A few comments on PR, nothing I think is major.

            Show
            czw Christopher Waters added a comment - A few comments on PR, nothing I think is major.

              People

              Assignee:
              jbosch Jim Bosch
              Reporter:
              tjenness Tim Jenness
              Reviewers:
              Christopher Waters
              Watchers:
              Christopher Waters, Eric Charles, James Chiang, Jim Bosch, Robert Lupton, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.