Fix Version/s: None
The Gen 3 preloaded/ repository currently contains curated calibs, as a side effect of its creation using Gen 2->3 conversion. However, preloading curated calibs increases the size of the download, causes unnecessary repository churn, and causes potential conflicts with template skymaps (possibly fixed upstream). Investigate whether curated calibs can be left out of the downloaded dataset, and picked up from <obs>_data during ingestion instead.
This issue cannot be done while we are maintaining datasets using Gen 2->3 conversion, as the dataset ingestion code must assume either that curated calibs are present or that they are absent.
- is blocked by
DM-28628 Actually adopt new calibs in ap_verify HiTS datasets
- To Do
DM-29857 Create pure Gen 3 dataset management scripts for ap_verify datasets
DM-25414 Guard against re-ingestion of curated calibs in ap_verify
- relates to
DM-33039 Re-examine how to handle dataset management scripts
I've implemented the proposed changes and have some specific numbers. The branch dataset ingestion runs ap_verify_ci_hits2015 2:00 – 2:20 slower than main, but the branch dataset is only 0.1 GB smaller out of 2.5 GB. So, while it may still be useful for avoiding repository churn, it's not worth it for the download savings alone.
I've decided that repository churn alone can't justify removing the curated calibs – it's hard to even propose it without trying to make an arbitrary distinction between image calibs (which we do want to update in response to code changes) and curated calibs.
I've pushed the implementation to the u/kfindeisen/
DM-32975-write-curated-on-ingest branch for future reference. The pushed code works on ap_verify_ci_hits2015, but does not yet have unit tests.
Running butler write-curated-calibrations for DECam takes about 2 minutes, so the trade-off is non-trivial.