Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-32975

Investigate omitting curated calibs from ap_verify dataset

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: ap_verify
    • Labels:
      None

      Description

      The Gen 3 preloaded/ repository currently contains curated calibs, as a side effect of its creation using Gen 2->3 conversion. However, preloading curated calibs increases the size of the download, causes unnecessary repository churn, and causes potential conflicts with template skymaps (possibly fixed upstream). Investigate whether curated calibs can be left out of the downloaded dataset, and picked up from <obs>_data during ingestion instead.

      This issue cannot be done while we are maintaining datasets using Gen 2->3 conversion, as the dataset ingestion code must assume either that curated calibs are present or that they are absent.

        Attachments

          Issue Links

            Activity

            Hide
            krzys Krzysztof Findeisen added a comment -

            Running butler write-curated-calibrations for DECam takes about 2 minutes, so the trade-off is non-trivial.

            Show
            krzys Krzysztof Findeisen added a comment - Running butler write-curated-calibrations for DECam takes about 2 minutes, so the trade-off is non-trivial.
            Hide
            krzys Krzysztof Findeisen added a comment -

            I've implemented the proposed changes and have some specific numbers. The branch dataset ingestion runs ap_verify_ci_hits2015 2:00 – 2:20 slower than main, but the branch dataset is only 0.1 GB smaller out of 2.5 GB. So, while it may still be useful for avoiding repository churn, it's not worth it for the download savings alone.

            Show
            krzys Krzysztof Findeisen added a comment - I've implemented the proposed changes and have some specific numbers. The branch dataset ingestion runs ap_verify_ci_hits2015 2:00 – 2:20 slower than main , but the branch dataset is only 0.1 GB smaller out of 2.5 GB. So, while it may still be useful for avoiding repository churn, it's not worth it for the download savings alone.
            Hide
            krzys Krzysztof Findeisen added a comment -

            I've decided that repository churn alone can't justify removing the curated calibs – it's hard to even propose it without trying to make an arbitrary distinction between image calibs (which we do want to update in response to code changes) and curated calibs.

            I've pushed the implementation to the u/kfindeisen/DM-32975-write-curated-on-ingest branch for future reference. The pushed code works on ap_verify_ci_hits2015, but does not yet have unit tests.

            Show
            krzys Krzysztof Findeisen added a comment - I've decided that repository churn alone can't justify removing the curated calibs – it's hard to even propose it without trying to make an arbitrary distinction between image calibs (which we do want to update in response to code changes) and curated calibs. I've pushed the implementation to the u/kfindeisen/ DM-32975 -write-curated-on-ingest branch for future reference. The pushed code works on ap_verify_ci_hits2015 , but does not yet have unit tests.

              People

              Assignee:
              krzys Krzysztof Findeisen
              Reporter:
              krzys Krzysztof Findeisen
              Watchers:
              Krzysztof Findeisen
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.