Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-33039

Re-examine how to handle dataset management scripts

    XMLWordPrintable

Details

    • Improvement
    • Status: Done
    • Resolution: Done
    • None
    • ap_verify
    • None

    Description

      Currently, there are two approaches to handling tools for dataset management:

      • The script for syncing a dataset's Gen 2 and Gen 3 content is (used to be?) located at https://github.com/lsst-dm/ap_verify_dataset_template/blob/main/scripts/add_gen3_repo.py, where it could theoretically be included in new datasets. This location is centralized, making it easier to update and improve on the script. However, it has proven an accessibility challenge for users who don't keep a copy of ap_verify_dataset_template handy (particularly Middleware, when the Gen 3 repository format was frequently changing).
      • The scripts for generating a dataset's contents from scratch, in either Gen 2 or Gen 3, are located in their individual scripts directories. This approach is easily accessible, but involves a large amount of code duplication that requires duplicate maintenance (for example, to deal with bit rot).

      In addition, Parejkoj proposed a third option, which is to maintain these scripts as part of ap_verify proper. This would combine the benefits of the above two approaches, but at the cost of flexibility – ap_verify should be agnostic to where datasets come from or how they are managed, and different datasets do sometimes require different processing (source selection criteria, quirks of particular observatories, etc.).

      Once the pressure of Gen 2 removal is behind us, revisit this issue to try to find the best solution for a pure Gen 3 world.

      Attachments

        Issue Links

          Activity

            While ap_verify_ci_hits2015 creates its own calibs using cp_pipe, it is likely that ap_verify_ci_cosmos_pdr2 will use the standard ones from /repo/main instead. While this doesn't exclude us from sharing some code (options 1 and 3 above), it would require that at least generate_all_gen3.sh be repository-specific.

            krzys Krzysztof Findeisen added a comment - While ap_verify_ci_hits2015 creates its own calibs using cp_pipe , it is likely that ap_verify_ci_cosmos_pdr2 will use the standard ones from /repo/main instead. While this doesn't exclude us from sharing some code (options 1 and 3 above), it would require that at least generate_all_gen3.sh be repository-specific.
            krzys Krzysztof Findeisen added a comment - - edited

            The decision made at the October 10 group meeting is to put example scripts in ap_verify_dataset_template, but to only run the copy of the script(s) from the same dataset (which the dataset author is free to tweak however is appropriate for how that dataset is sourced).

            New scope of this issue is to distill the contents of the extant repositories into something that makes a good "generic" setup, and add instructions.

            krzys Krzysztof Findeisen added a comment - - edited The decision made at the October 10 group meeting is to put example scripts in ap_verify_dataset_template , but to only run the copy of the script(s) from the same dataset (which the dataset author is free to tweak however is appropriate for how that dataset is sourced). New scope of this issue is to distill the contents of the extant repositories into something that makes a good "generic" setup, and add instructions.

            Thanks for agreeing to review this, kherner! For the main PR, I strongly recommend skipping the "Add ___ script" commits – they are pure copies of the existing files, and without them the review is only 400 lines instead of 1000.

            krzys Krzysztof Findeisen added a comment - Thanks for agreeing to review this, kherner ! For the main PR, I strongly recommend skipping the "Add ___ script" commits – they are pure copies of the existing files, and without them the review is only 400 lines instead of 1000.

            People

              krzys Krzysztof Findeisen
              krzys Krzysztof Findeisen
              Kenneth Herner
              Eric Bellm, John Parejko, Kenneth Herner, Krzysztof Findeisen, Meredith Rawls
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Jenkins

                  No builds found.