Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-22708

Fix decam gen3 ingest

    XMLWordPrintable

    Details

      Description

      It turns out the work I did on decam for DM-20763 was not complete: I only tested that one dataId could be retrieved from the ingested raw data, but there are two ccds in that file in two different HDUs and the second one is not actually getting ingested. The gen2 CameraMapper path for a decam raw looks like decam%(visit)07d.fits.fz[%(hdu)d], so the hdu number is baked in there. As far as I can tell, gen3 doesn't have any way to encode hdu number. Maybe this has to be a further specialization in the RawFormatter. File paths that go into butler.ingest have to actually exist, so we will have to do something to the raw paths that we pass into `butler.ingest`.

      The gen2 butler registry has an `hdu` field that encodes which hdu to get that raw from. It doesn't look like there's an equivalent `hdu` field anywhere in the gen3 registry: could we add an extra `hdu` field to the `posix_datastore_records` table? That's my best guess as to where it should live.

        Attachments

          Issue Links

            Activity

            Hide
            swinbank John Swinbank added a comment -

            (Blocked waiting for input from the middleware team.)

            Show
            swinbank John Swinbank added a comment - (Blocked waiting for input from the middleware team.)
            Hide
            tjenness Tim Jenness added a comment -

            Gen3 can now accept multiple datasets associated with a single file. You need to create multiple DatasetRef – one per detector. I've also changed formatters so that they now can access the dataId. This means that you can write a DecamRawFormatter that understands it needs to get the detector number from the data ID and use that to choose the correct extension from the file (possibly by looking up the mapping from detector number to extension number in the header).

            Show
            tjenness Tim Jenness added a comment - Gen3 can now accept multiple datasets associated with a single file. You need to create multiple DatasetRef – one per detector. I've also changed formatters so that they now can access the dataId. This means that you can write a DecamRawFormatter that understands it needs to get the detector number from the data ID and use that to choose the correct extension from the file (possibly by looking up the mapping from detector number to extension number in the header).
            Hide
            Parejkoj John Parejko added a comment -

            Note to self: once I get basic ingestion working for decam here, I should time it vs. gen2 ingestion using the hits2015 data to see how it compares speed-wise.

            Show
            Parejkoj John Parejko added a comment - Note to self: once I get basic ingestion working for decam here, I should time it vs. gen2 ingestion using the hits2015 data to see how it compares speed-wise.
            Hide
            Parejkoj John Parejko added a comment - - edited

            I think I've got this working! I haven't run a test on the full hits2015 yet, but the test I wrote much earlier now passes (both hdus of testdata_decam get ingested).

            Jenkins: https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/31084/pipeline

            Show
            Parejkoj John Parejko added a comment - - edited I think I've got this working! I haven't run a test on the full hits2015 yet, but the test I wrote much earlier now passes (both hdus of testdata_decam get ingested). Jenkins: https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/31084/pipeline
            Hide
            Parejkoj John Parejko added a comment -

            Tim Jenness, would you be able to review these changes? It's about 200 lines of python, plus 10 lines of pybind11 and a new decam zeroed-image FITS file.

            Show
            Parejkoj John Parejko added a comment - Tim Jenness , would you be able to review these changes? It's about 200 lines of python, plus 10 lines of pybind11 and a new decam zeroed-image FITS file.
            Hide
            tjenness Tim Jenness added a comment -

            Looks good. Some minor comments on PRs.

            Show
            tjenness Tim Jenness added a comment - Looks good. Some minor comments on PRs.
            Hide
            Parejkoj John Parejko added a comment - - edited

            Update with some timing information using ap_verify_hits2015 data on my desktop

            • gen2 ingest of raw+calib+defects takes about 60 seconds (defects is maybe 5s of that).
            • gen3 ingest of just raws takes about 180 seconds without calculating checksums.

            Thus, it looks like gen3 is about 6x slower (~half of the gen2 time is raws). I've attached a call graph for the gen3 ingest: visitInfo/obsInfo and region calculation seem to be high bars (~30% overall), with `makeSkyWcs` coming in at a rather surprising 10%. Those calculations are no doubt a necessary part of the tract/visit mapping, but it might be worth seeing if we can speed any of them up.

            Show
            Parejkoj John Parejko added a comment - - edited Update with some timing information using ap_verify_hits2015 data on my desktop gen2 ingest of raw+calib+defects takes about 60 seconds (defects is maybe 5s of that). gen3 ingest of just raws takes about 180 seconds without calculating checksums. Thus, it looks like gen3 is about 6x slower (~half of the gen2 time is raws). I've attached a call graph for the gen3 ingest: visitInfo/obsInfo and region calculation seem to be high bars (~30% overall), with `makeSkyWcs` coming in at a rather surprising 10%. Those calculations are no doubt a necessary part of the tract/visit mapping, but it might be worth seeing if we can speed any of them up.
            Hide
            tjenness Tim Jenness added a comment -

            Is the point here that gen2 does not calculate regions around ingested raws at all?

            Show
            tjenness Tim Jenness added a comment - Is the point here that gen2 does not calculate regions around ingested raws at all?
            Hide
            Parejkoj John Parejko added a comment -

            That certainly is part of it. I don't know if it's all of it though. 6x slower is quite a lot.

            Show
            Parejkoj John Parejko added a comment - That certainly is part of it. I don't know if it's all of it though. 6x slower is quite a lot.
            Hide
            tjenness Tim Jenness added a comment -

            3 times isn't it?

            Show
            tjenness Tim Jenness added a comment - 3 times isn't it?
            Hide
            Parejkoj John Parejko added a comment -

            gen3 is 180s for just raws. gen2 is 60s for raw+calib+defect, and by-eye timing the raws are about half of that.

            Show
            Parejkoj John Parejko added a comment - gen3 is 180s for just raws. gen2 is 60s for raw+calib+defect, and by-eye timing the raws are about half of that.
            Hide
            tjenness Tim Jenness added a comment - - edited

            In theory we could modify ObservationInfo constructor so that you could select which properties you want calculated (or allow all the translations to be on demand). That would at least allow ingest to only ask for the handful of items it needs. I don't think gen2 ingest was switched over to use ObservationInfo yet (obs_lsst does).

            Show
            tjenness Tim Jenness added a comment - - edited In theory we could modify ObservationInfo constructor so that you could select which properties you want calculated (or allow all the translations to be on demand). That would at least allow ingest to only ask for the handful of items it needs. I don't think gen2 ingest was switched over to use ObservationInfo yet (obs_lsst does).
            Hide
            Parejkoj John Parejko added a comment -

            Thanks for the review comments. I believe I've addressed them all.

            New Jenkins run: https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/31113/pipeline

            Show
            Parejkoj John Parejko added a comment - Thanks for the review comments. I believe I've addressed them all. New Jenkins run: https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/31113/pipeline

              People

              Assignee:
              Parejkoj John Parejko
              Reporter:
              Parejkoj John Parejko
              Reviewers:
              Tim Jenness
              Watchers:
              Colin Slater, Jim Bosch, John Parejko, John Swinbank, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.