Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-33039

Re-examine how to handle dataset management scripts

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: ap_verify
    • Labels:
      None

      Description

      Currently, there are two approaches to handling tools for dataset management:

      • The script for syncing a dataset's Gen 2 and Gen 3 content is (used to be?) located at https://github.com/lsst-dm/ap_verify_dataset_template/blob/main/scripts/add_gen3_repo.py, where it could theoretically be included in new datasets. This location is centralized, making it easier to update and improve on the script. However, it has proven an accessibility challenge for users who don't keep a copy of ap_verify_dataset_template handy (particularly Middleware, when the Gen 3 repository format was frequently changing).
      • The scripts for generating a dataset's contents from scratch, in either Gen 2 or Gen 3, are located in their individual scripts directories. This approach is easily accessible, but involves a large amount of code duplication that requires duplicate maintenance (for example, to deal with bit rot).

      In addition, John Parejko proposed a third option, which is to maintain these scripts as part of ap_verify proper. This would combine the benefits of the above two approaches, but at the cost of flexibility – ap_verify should be agnostic to where datasets come from or how they are managed, and different datasets do sometimes require different processing (source selection criteria, quirks of particular observatories, etc.).

      Once the pressure of Gen 2 removal is behind us, revisit this issue to try to find the best solution for a pure Gen 3 world.

        Attachments

          Issue Links

            Activity

            Hide
            krzys Krzysztof Findeisen added a comment -

            While ap_verify_ci_hits2015 creates its own calibs using cp_pipe, it is likely that ap_verify_ci_cosmos_pdr2 will use the standard ones from /repo/main instead. While this doesn't exclude us from sharing some code (options 1 and 3 above), it would require that at least generate_all_gen3.sh be repository-specific.

            Show
            krzys Krzysztof Findeisen added a comment - While ap_verify_ci_hits2015 creates its own calibs using cp_pipe , it is likely that ap_verify_ci_cosmos_pdr2 will use the standard ones from /repo/main instead. While this doesn't exclude us from sharing some code (options 1 and 3 above), it would require that at least generate_all_gen3.sh be repository-specific.
            Hide
            krzys Krzysztof Findeisen added a comment - - edited

            The decision made at the October 10 group meeting is to put example scripts in ap_verify_dataset_template, but to only run the copy of the script(s) from the same dataset (which the dataset author is free to tweak however is appropriate for how that dataset is sourced).

            New scope of this issue is to distill the contents of the extant repositories into something that makes a good "generic" setup, and add instructions.

            Show
            krzys Krzysztof Findeisen added a comment - - edited The decision made at the October 10 group meeting is to put example scripts in ap_verify_dataset_template , but to only run the copy of the script(s) from the same dataset (which the dataset author is free to tweak however is appropriate for how that dataset is sourced). New scope of this issue is to distill the contents of the extant repositories into something that makes a good "generic" setup, and add instructions.
            Hide
            krzys Krzysztof Findeisen added a comment -

            Thanks for agreeing to review this, Kenneth Herner! For the main PR, I strongly recommend skipping the "Add ___ script" commits – they are pure copies of the existing files, and without them the review is only 400 lines instead of 1000.

            Show
            krzys Krzysztof Findeisen added a comment - Thanks for agreeing to review this, Kenneth Herner ! For the main PR, I strongly recommend skipping the "Add ___ script" commits – they are pure copies of the existing files, and without them the review is only 400 lines instead of 1000.

              People

              Assignee:
              krzys Krzysztof Findeisen
              Reporter:
              krzys Krzysztof Findeisen
              Reviewers:
              Kenneth Herner
              Watchers:
              Eric Bellm, John Parejko, Kenneth Herner, Krzysztof Findeisen, Meredith Rawls
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.