Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-22370

Support Gen 3 ingestion of image-type calibs

    Details

    • Story Points:
      6
    • Team:
      Data Release Production
    • Urgent?:
      No

      Description

      There is currently no general-purpose task for ingesting "ordinary" calibration data (flats, biases, etc.), analogous to the Gen 2 pipe.tasks.IngestCalibsTask. We would like to be able to ingest such files from a non-repository location into a Gen 3 repository, without providing any information at the code/config level that was automatically inferred in Gen 2.

        Attachments

          Issue Links

            Activity

            Hide
            tjenness Tim Jenness added a comment -

            Is this ticket trying to ingest calibrations that we have created ourselves (e.g. in pipe_drivers or cp_pipe) or are these external calibrations (like DECam community pipeline)?

            Do we expect sufficient metadata to be present in the files or are we providing extra metadata on the commandline? Do we know which instrument from the files or do we specify it separately? How do we specify it? Which filter? Are we doing metadata extraction in a generic way as for astro_metadata_translator? Which exposures went into the calibration?

            Currently in gen3 for curated calibrations from text files we do a direct butler.put.

            Show
            tjenness Tim Jenness added a comment - Is this ticket trying to ingest calibrations that we have created ourselves (e.g. in pipe_drivers or cp_pipe) or are these external calibrations (like DECam community pipeline)? Do we expect sufficient metadata to be present in the files or are we providing extra metadata on the commandline? Do we know which instrument from the files or do we specify it separately? How do we specify it? Which filter? Are we doing metadata extraction in a generic way as for astro_metadata_translator? Which exposures went into the calibration? Currently in gen3 for curated calibrations from text files we do a direct butler.put.
            Hide
            krzys Krzysztof Findeisen added a comment - - edited

            My understanding is that John Parejko's work will most likely make this issue obsolete, but that we don't yet have an alternative system. So I'm not sure we need to nail down more specific requirements. That said...

            Is this ticket trying to ingest calibrations that we have created ourselves (e.g. in pipe_drivers or cp_pipe) or are these external calibrations (like DECam community pipeline)?

            The original intent was "both".

            Do we expect sufficient metadata to be present in the files or are we providing extra metadata on the commandline?

            My meaning with "without providing any information... that was automatically inferred in Gen 2" was that we should not narrow the definition of "sufficient metadata" in going from Gen 2 to Gen 3. I would of course prefer that everything be automated, but I take it that there is Gen 3 metadata that doesn't have a Gen 2 equivalent?

            Do we know which instrument from the files or do we specify it separately? How do we specify it? Which filter?

            No requirement (except that AFAIK filter is covered by the previous point, in that it was automatically inferred in Gen 2).

            Are we doing metadata extraction in a generic way as for astro_metadata_translator? Which exposures went into the calibration?

            I'm not sure I know enough about astro_metadata_translator to answer either of these questions.

            Show
            krzys Krzysztof Findeisen added a comment - - edited My understanding is that John Parejko 's work will most likely make this issue obsolete, but that we don't yet have an alternative system. So I'm not sure we need to nail down more specific requirements. That said... Is this ticket trying to ingest calibrations that we have created ourselves (e.g. in pipe_drivers or cp_pipe) or are these external calibrations (like DECam community pipeline)? The original intent was "both". Do we expect sufficient metadata to be present in the files or are we providing extra metadata on the commandline? My meaning with "without providing any information... that was automatically inferred in Gen 2" was that we should not narrow the definition of "sufficient metadata" in going from Gen 2 to Gen 3. I would of course prefer that everything be automated, but I take it that there is Gen 3 metadata that doesn't have a Gen 2 equivalent? Do we know which instrument from the files or do we specify it separately? How do we specify it? Which filter? No requirement (except that AFAIK filter is covered by the previous point, in that it was automatically inferred in Gen 2). Are we doing metadata extraction in a generic way as for astro_metadata_translator? Which exposures went into the calibration? I'm not sure I know enough about astro_metadata_translator to answer either of these questions.
            Hide
            tjenness Tim Jenness added a comment -

            Gen2 calibration ingest only looks at a very small set of headers (CALIB_ID being the magic one, but also OBSTYPE I think with the bias vs flat). pipe_drivers adds these headers but recently also started propagating all the metadata. Most old calibrations don't know what instrument they are from. Gen2 doesn't know about instruments and ingests what it's given and assumes every calibration has the same headers (I don't know how it handles cpBias etc which don't have CALIB_ID). Gen3 absolutely requires you to tell it which instrument is involved so I guess that has to follow whatever we do for ingestImages.py replacement and gets a command line switch that maps to a gen3 Instrument class.

            Show
            tjenness Tim Jenness added a comment - Gen2 calibration ingest only looks at a very small set of headers (CALIB_ID being the magic one, but also OBSTYPE I think with the bias vs flat). pipe_drivers adds these headers but recently also started propagating all the metadata. Most old calibrations don't know what instrument they are from. Gen2 doesn't know about instruments and ingests what it's given and assumes every calibration has the same headers (I don't know how it handles cpBias etc which don't have CALIB_ID). Gen3 absolutely requires you to tell it which instrument is involved so I guess that has to follow whatever we do for ingestImages.py replacement and gets a command line switch that maps to a gen3 Instrument class.
            Hide
            tjenness Tim Jenness added a comment -

            Krzysztof Findeisen this ticket is listed as a blocker for gen2 deprecation but after discussing this today with Jim Bosch we aren't sure whether the ticket as it stands should be a blocker.

            In gen3 the calibrations are made inside the repository and so there is no need to ingest them. There is no equivalent to pipe_drivers that runs completely independently. I'm not sure we know the use case for people downloading calibrations from gen3 standalone and then independently ingesting them into their own gen3 repository. Is that a requirement? I think the system is assuming that would be done as a butler export which comes with a metadata yaml file.

            For DECam community pipeline calibrations and CFHT Elixir calibrations I think we need to write special importers for them. That would be a bit like ingest-raws in the sense that you would have to declare somewhere that you are ingesting a specific calibration product from an external source. We can have distinct Jira tickets for DECam and CFHT (DECam is partly done already but only in 2to3 and so does not include metadata extraction from the headers) if you like.

            Or do you really want a gen3 script for ingesting specifically gen2 master calibrations? (with validity ranges supplied). Or do you want a gen3 script for ingesting gen3 outputs from cp_pipe that are not attached to a registry?

            Show
            tjenness Tim Jenness added a comment - Krzysztof Findeisen this ticket is listed as a blocker for gen2 deprecation but after discussing this today with Jim Bosch we aren't sure whether the ticket as it stands should be a blocker. In gen3 the calibrations are made inside the repository and so there is no need to ingest them. There is no equivalent to pipe_drivers that runs completely independently. I'm not sure we know the use case for people downloading calibrations from gen3 standalone and then independently ingesting them into their own gen3 repository. Is that a requirement? I think the system is assuming that would be done as a butler export which comes with a metadata yaml file. For DECam community pipeline calibrations and CFHT Elixir calibrations I think we need to write special importers for them. That would be a bit like ingest-raws in the sense that you would have to declare somewhere that you are ingesting a specific calibration product from an external source. We can have distinct Jira tickets for DECam and CFHT (DECam is partly done already but only in 2to3 and so does not include metadata extraction from the headers) if you like. Or do you really want a gen3 script for ingesting specifically gen2 master calibrations? (with validity ranges supplied). Or do you want a gen3 script for ingesting gen3 outputs from cp_pipe that are not attached to a registry?
            Hide
            krzys Krzysztof Findeisen added a comment -

            A special importer might be doable for DECam. At least, I don't think it would directly affect ap_verify, since we'd be running the custom ingester as part of repository creation/maintenance, not as part of ap_verify proper.

            However, I don't know where else we use external calib files like the CP products; maybe Meredith Rawls would have a better idea?

            Show
            krzys Krzysztof Findeisen added a comment - A special importer might be doable for DECam. At least, I don't think it would directly affect ap_verify , since we'd be running the custom ingester as part of repository creation/maintenance, not as part of ap_verify proper. However, I don't know where else we use external calib files like the CP products; maybe Meredith Rawls would have a better idea?
            Hide
            mrawls Meredith Rawls added a comment -

            Given that we are able to use cp_pipe to build Science Pipelines biases and flats for DECam, I don't see a strong need for supporting DECam community pipeline mastercalibs directly in gen3. We could write a separate importer as Tim says.

            That said, right now with gen2, I can run a set of commands to go from "pile of raw DECam images" to "processed data products." This is essential for many science users, and it's something I will be doing occasionally to build new calibs for my HiTS2015 DECam dataset to make sure we understand e.g. how changes in cp_pipe may affect results from the AP Pipeline.

            I think this ticket should be to ensure there is a functional and documented gen3 equivalent of steps 1-9 below. The example I provide here is for DECam, but I am highly interested in reproducing this workflow for HSC and other cameras as well; I just have no idea how to do it.

            1. Make a new empty directory and turn it into a repo (e.g., put a _mapper file in it for gen2)
            2. Put all your raw calibs and science images in a subdirectory `raw` and also make an empty `calib` subdirectory
            3. ingestImagesDecam.py . --mode=link raw/*
            4. ingestCuratedCalibs.py . --calib calib $OBS_DECAM_DATA_DIR/decam/defects
            5. ingestCuratedCalibs.py . --calib calib $OBS_DECAM_DATA_DIR/decam/crosstalk
            6. constructBias.py . --calib calib --output calib_construction --id visit=biasVisit1^biasVisit2^biasVisitN --batch-type none
            7. ingestCalibs.py . --calib calib --mode=link --validity 999 calib_construction/BIAS/[date]/*.fits
            8. constructFlat.py . --calib calib --output calib_construction --id visit=flatVisit1^flatVisit2^flatVisitN --config isr.doDark=False --batch-type none
            9. ingestCalibs.py . --calib calib --mode=link --validity 999 calib_construction/FLAT/[date]/[filter]/*.fits
            10. Make a ref_cats subdirectory and add symlinks to Pan-STARRS and gaia-dr2 reference catalogs for photometry and astrometry
            11. processCcd.py . --calib calib --rerun [rerun] --id visit=scienceVisit1^scienceVisit2^scienceVisitN --config isr.biasDataProductName='bias' isr.flatDataProductName='flat'
            Show
            mrawls Meredith Rawls added a comment - Given that we are able to use cp_pipe to build Science Pipelines biases and flats for DECam, I don't see a strong need for supporting DECam community pipeline mastercalibs directly in gen3. We could write a separate importer as Tim says. That said, right now with gen2, I can run a set of commands to go from "pile of raw DECam images" to "processed data products." This is essential for many science users, and it's something I will be doing occasionally to build new calibs for my HiTS2015 DECam dataset to make sure we understand e.g. how changes in cp_pipe may affect results from the AP Pipeline. I think this ticket should be to ensure there is a functional and documented gen3 equivalent of steps 1-9 below. The example I provide here is for DECam, but I am highly interested in reproducing this workflow for HSC and other cameras as well; I just have no idea how to do it. Make a new empty directory and turn it into a repo (e.g., put a _mapper file in it for gen2) Put all your raw calibs and science images in a subdirectory `raw` and also make an empty `calib` subdirectory ingestImagesDecam.py . --mode=link raw/* ingestCuratedCalibs.py . --calib calib $OBS_DECAM_DATA_DIR/decam/defects ingestCuratedCalibs.py . --calib calib $OBS_DECAM_DATA_DIR/decam/crosstalk constructBias.py . --calib calib --output calib_construction --id visit=biasVisit1^biasVisit2^biasVisitN --batch-type none ingestCalibs.py . --calib calib --mode=link --validity 999 calib_construction/BIAS/ [date] /*.fits constructFlat.py . --calib calib --output calib_construction --id visit=flatVisit1^flatVisit2^flatVisitN --config isr.doDark=False --batch-type none ingestCalibs.py . --calib calib --mode=link --validity 999 calib_construction/FLAT/ [date] / [filter] /*.fits Make a ref_cats subdirectory and add symlinks to Pan-STARRS and gaia-dr2 reference catalogs for photometry and astrometry processCcd.py . --calib calib --rerun [rerun] --id visit=scienceVisit1^scienceVisit2^scienceVisitN --config isr.biasDataProductName='bias' isr.flatDataProductName='flat'
            Hide
            tjenness Tim Jenness added a comment -

            I think steps 1 to 9 in gen3 are butler ingest-raws + cp_pipe + certification. No explicit ingest of external calibrations. I think this means that we don't need an ingestCalibs.py in the short term.

            Show
            tjenness Tim Jenness added a comment - I think steps 1 to 9 in gen3 are butler ingest-raws + cp_pipe + certification. No explicit ingest of external calibrations. I think this means that we don't need an ingestCalibs.py in the short term.
            Hide
            tjenness Tim Jenness added a comment -

            I'm going to remove this as a gen2 deprecation blocker. If people really need an ingestCalibs replacement specifically for gen2 pipe_drivers outputs then please adjust this ticket. If it's needed before gen3 is usable for people then I really need to know.

            Please create separate tickets for ingesting external DECam community pipelines calibrations if that is needed.

            Show
            tjenness Tim Jenness added a comment - I'm going to remove this as a gen2 deprecation blocker. If people really need an ingestCalibs replacement specifically for gen2 pipe_drivers outputs then please adjust this ticket. If it's needed before gen3 is usable for people then I really need to know. Please create separate tickets for ingesting external DECam community pipelines calibrations if that is needed.

              People

              • Assignee:
                Unassigned
                Reporter:
                krzys Krzysztof Findeisen
                Watchers:
                Christopher Waters, Ian Sullivan, Jim Bosch, John Parejko, Krzysztof Findeisen, Meredith Rawls, Tim Jenness
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Summary Panel