Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-22370

Support Gen 3 ingestion of image-type calibs

    XMLWordPrintable

    Details

    • Story Points:
      6
    • Team:
      Data Release Production
    • Urgent?:
      No

      Description

      There is currently no general-purpose task for ingesting "ordinary" calibration data (flats, biases, etc.), analogous to the Gen 2 pipe.tasks.IngestCalibsTask. We would like to be able to ingest such files from a non-repository location into a Gen 3 repository, without providing any information at the code/config level that was automatically inferred in Gen 2.

        Attachments

          Issue Links

            Activity

            Hide
            tjenness Tim Jenness added a comment -

            I'm going to remove this as a gen2 deprecation blocker. If people really need an ingestCalibs replacement specifically for gen2 pipe_drivers outputs then please adjust this ticket. If it's needed before gen3 is usable for people then I really need to know.

            Please create separate tickets for ingesting external DECam community pipelines calibrations if that is needed.

            Show
            tjenness Tim Jenness added a comment - I'm going to remove this as a gen2 deprecation blocker. If people really need an ingestCalibs replacement specifically for gen2 pipe_drivers outputs then please adjust this ticket. If it's needed before gen3 is usable for people then I really need to know. Please create separate tickets for ingesting external DECam community pipelines calibrations if that is needed.
            Hide
            tjenness Tim Jenness added a comment -

            Another option is to document the subset of YAML that we use for export/import and ask people to write it manually. The usual blockers for a generic ingest script are how to work out the dataId. We could think of a simple CSV format where you specify a single dataset type and formatter and then list rows in a csv table like:

            file,exposure,detector
            a.fits,1234566,52
            b.fits,124556,12
            

            and ingest that (allowing multiple rows for the same file if needed). Then you run:

            $ butler ingest-files REPO datasetType formatter_class myfile.csv --transfer=auto
            

            This would work fairly well and would not require the generic ingester to try to read the file headers to work out what's going on. It breaks down if these dataIds are not known to the system (ingest-raws creates exposure dimension records for example) but it might be a quick way to get something working for people.

            Show
            tjenness Tim Jenness added a comment - Another option is to document the subset of YAML that we use for export/import and ask people to write it manually. The usual blockers for a generic ingest script are how to work out the dataId. We could think of a simple CSV format where you specify a single dataset type and formatter and then list rows in a csv table like: file,exposure,detector a.fits,1234566,52 b.fits,124556,12 and ingest that (allowing multiple rows for the same file if needed). Then you run: $ butler ingest-files REPO datasetType formatter_class myfile.csv --transfer=auto This would work fairly well and would not require the generic ingester to try to read the file headers to work out what's going on. It breaks down if these dataIds are not known to the system (ingest-raws creates exposure dimension records for example) but it might be a quick way to get something working for people.
            Hide
            mrawls Meredith Rawls added a comment - - edited

            I think this ticket was asking for this approximate workflow of butler commands that now exist: create, register-instrument, write-curated-calibrations, ingest-raws, [insert running appropriate cp_pipe pipelines here], certify-calibrations, profit. Well that last one sadly doesn't exist. I think define-visits needs to be in there too.

            I agree it's not a gen2 deprecation blocker. I do think a generic ingest script like you suggest would be helpful, or at the very least documentation (ideally tutorial-style) for the steps I just listed.

            Show
            mrawls Meredith Rawls added a comment - - edited I think this ticket was asking for this approximate workflow of butler commands that now exist: create, register-instrument, write-curated-calibrations, ingest-raws, [insert running appropriate cp_pipe pipelines here] , certify-calibrations, profit. Well that last one sadly doesn't exist. I think define-visits needs to be in there too. I agree it's not a gen2 deprecation blocker. I do think a generic ingest script like you suggest would be helpful, or at the very least documentation (ideally tutorial-style) for the steps I just listed.
            Hide
            tjenness Tim Jenness added a comment -

            There is no way at the moment to transfer an image calibration file made outside of butler into a gen3 butler. Your only option is to run ingestCalibs.py on a gen2 butler and convert to gen3. The steps 1 to 9 can be done by gen3 on its own but that doesn't solve the problem that if you have an external calibration you haven't got a step 7 for it (no ingestCalibs.py equivalent).

            Documenting how to write an ingest of arbitrary file is probably quite detailed but a "ingest-files" command taking a CSV might well be the easy way around this and would let you cobble up a script to parse headers yourself to generate the CSV file.

            Show
            tjenness Tim Jenness added a comment - There is no way at the moment to transfer an image calibration file made outside of butler into a gen3 butler. Your only option is to run ingestCalibs.py on a gen2 butler and convert to gen3. The steps 1 to 9 can be done by gen3 on its own but that doesn't solve the problem that if you have an external calibration you haven't got a step 7 for it (no ingestCalibs.py equivalent). Documenting how to write an ingest of arbitrary file is probably quite detailed but a "ingest-files" command taking a CSV might well be the easy way around this and would let you cobble up a script to parse headers yourself to generate the CSV file.
            Hide
            tjenness Tim Jenness added a comment -

            butler ingest-files can be used as part of this – scan the files, calculate the dataIds, then call ingest-files. Validity ranges would then have to be set with a different API call. A unified script that can do all three parts of this in one is a possibility but we need some idea as to the importance of such a script and how generic it would have to be.

            Show
            tjenness Tim Jenness added a comment - butler ingest-files can be used as part of this – scan the files, calculate the dataIds, then call ingest-files. Validity ranges would then have to be set with a different API call. A unified script that can do all three parts of this in one is a possibility but we need some idea as to the importance of such a script and how generic it would have to be.

              People

              Assignee:
              Unassigned Unassigned
              Reporter:
              krzys Krzysztof Findeisen
              Watchers:
              Christopher Waters, Ian Sullivan, Jim Bosch, John Parejko, Kenneth Herner, Meredith Rawls, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Dates

                Created:
                Updated:

                  Jenkins

                  No builds found.