# Fix decam gen3 ingest

XMLWordPrintable

#### Details

• Type: Story
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
None
• Story Points:
8
• Sprint:
AP S20-2 (January)
• Team:
• Urgent?:
No

#### Description

It turns out the work I did on decam for DM-20763 was not complete: I only tested that one dataId could be retrieved from the ingested raw data, but there are two ccds in that file in two different HDUs and the second one is not actually getting ingested. The gen2 CameraMapper path for a decam raw looks like decam%(visit)07d.fits.fz[%(hdu)d], so the hdu number is baked in there. As far as I can tell, gen3 doesn't have any way to encode hdu number. Maybe this has to be a further specialization in the RawFormatter. File paths that go into butler.ingest have to actually exist, so we will have to do something to the raw paths that we pass into butler.ingest.

The gen2 butler registry has an hdu field that encodes which hdu to get that raw from. It doesn't look like there's an equivalent hdu field anywhere in the gen3 registry: could we add an extra hdu field to the posix_datastore_records table? That's my best guess as to where it should live.

#### Attachments

1. gen3ingest.dot
100 kB
2. gen3ingest.dot.pdf
67 kB

#### Activity

Hide
John Swinbank added a comment -

(Blocked waiting for input from the middleware team.)

Show
John Swinbank added a comment - (Blocked waiting for input from the middleware team.)
Hide
Tim Jenness added a comment -

Gen3 can now accept multiple datasets associated with a single file. You need to create multiple DatasetRef – one per detector. I've also changed formatters so that they now can access the dataId. This means that you can write a DecamRawFormatter that understands it needs to get the detector number from the data ID and use that to choose the correct extension from the file (possibly by looking up the mapping from detector number to extension number in the header).

Show
Tim Jenness added a comment - Gen3 can now accept multiple datasets associated with a single file. You need to create multiple DatasetRef – one per detector. I've also changed formatters so that they now can access the dataId. This means that you can write a DecamRawFormatter that understands it needs to get the detector number from the data ID and use that to choose the correct extension from the file (possibly by looking up the mapping from detector number to extension number in the header).
Hide
John Parejko added a comment -

Note to self: once I get basic ingestion working for decam here, I should time it vs. gen2 ingestion using the hits2015 data to see how it compares speed-wise.

Show
John Parejko added a comment - Note to self: once I get basic ingestion working for decam here, I should time it vs. gen2 ingestion using the hits2015 data to see how it compares speed-wise.
Hide
John Parejko added a comment - - edited

I think I've got this working! I haven't run a test on the full hits2015 yet, but the test I wrote much earlier now passes (both hdus of testdata_decam get ingested).

Show
John Parejko added a comment - - edited I think I've got this working! I haven't run a test on the full hits2015 yet, but the test I wrote much earlier now passes (both hdus of testdata_decam get ingested). Jenkins: https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/31084/pipeline
Hide
John Parejko added a comment -

Tim Jenness, would you be able to review these changes? It's about 200 lines of python, plus 10 lines of pybind11 and a new decam zeroed-image FITS file.

Show
John Parejko added a comment - Tim Jenness , would you be able to review these changes? It's about 200 lines of python, plus 10 lines of pybind11 and a new decam zeroed-image FITS file.
Hide
Tim Jenness added a comment -

Looks good. Some minor comments on PRs.

Show
Tim Jenness added a comment - Looks good. Some minor comments on PRs.
Hide
John Parejko added a comment - - edited

Update with some timing information using ap_verify_hits2015 data on my desktop

• gen2 ingest of raw+calib+defects takes about 60 seconds (defects is maybe 5s of that).
• gen3 ingest of just raws takes about 180 seconds without calculating checksums.

Thus, it looks like gen3 is about 6x slower (~half of the gen2 time is raws). I've attached a call graph for the gen3 ingest: visitInfo/obsInfo and region calculation seem to be high bars (~30% overall), with makeSkyWcs coming in at a rather surprising 10%. Those calculations are no doubt a necessary part of the tract/visit mapping, but it might be worth seeing if we can speed any of them up.

Show
John Parejko added a comment - - edited Update with some timing information using ap_verify_hits2015 data on my desktop gen2 ingest of raw+calib+defects takes about 60 seconds (defects is maybe 5s of that). gen3 ingest of just raws takes about 180 seconds without calculating checksums. Thus, it looks like gen3 is about 6x slower (~half of the gen2 time is raws). I've attached a call graph for the gen3 ingest: visitInfo/obsInfo and region calculation seem to be high bars (~30% overall), with makeSkyWcs coming in at a rather surprising 10%. Those calculations are no doubt a necessary part of the tract/visit mapping, but it might be worth seeing if we can speed any of them up.
Hide
Tim Jenness added a comment -

Is the point here that gen2 does not calculate regions around ingested raws at all?

Show
Tim Jenness added a comment - Is the point here that gen2 does not calculate regions around ingested raws at all?
Hide
John Parejko added a comment -

That certainly is part of it. I don't know if it's all of it though. 6x slower is quite a lot.

Show
John Parejko added a comment - That certainly is part of it. I don't know if it's all of it though. 6x slower is quite a lot.
Hide
Tim Jenness added a comment -

3 times isn't it?

Show
Tim Jenness added a comment - 3 times isn't it?
Hide
John Parejko added a comment -

gen3 is 180s for just raws. gen2 is 60s for raw+calib+defect, and by-eye timing the raws are about half of that.

Show
John Parejko added a comment - gen3 is 180s for just raws. gen2 is 60s for raw+calib+defect, and by-eye timing the raws are about half of that.
Hide
Tim Jenness added a comment - - edited

In theory we could modify ObservationInfo constructor so that you could select which properties you want calculated (or allow all the translations to be on demand). That would at least allow ingest to only ask for the handful of items it needs. I don't think gen2 ingest was switched over to use ObservationInfo yet (obs_lsst does).

Show
Tim Jenness added a comment - - edited In theory we could modify ObservationInfo constructor so that you could select which properties you want calculated (or allow all the translations to be on demand). That would at least allow ingest to only ask for the handful of items it needs. I don't think gen2 ingest was switched over to use ObservationInfo yet (obs_lsst does).
Hide
John Parejko added a comment -

Show
John Parejko added a comment - Thanks for the review comments. I believe I've addressed them all. New Jenkins run: https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/31113/pipeline

#### People

Assignee:
John Parejko
Reporter:
John Parejko
Reviewers:
Tim Jenness
Watchers:
Colin Slater, Jim Bosch, John Parejko, John Swinbank, Tim Jenness