Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-12012

Add more filetypes to wcl so to clean up "junks" in the previous mini hsc test

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Starting with the min test wcl file from DM-10474, go through the junk tar balls and add new filetypes as needed. Ignore those that were renamed to allow Butler to locate the files.

      Files needed to be taken care of are:

      • SRCMATCH
      • SRCMATCHFULL
      • ICSRC
      • BKGD
      • metadata boost files
      • thumbs pngs
        If possible, also do:
      • (output) schema files
      • (output) config files
      • package version pickes

        Attachments

          Issue Links

            Activity

            Hide
            hchiang2 Hsin-Fang Chiang added a comment -

            So far all boost files and other typical dataset files have been added to wcl so they are tracked and no longer showing up in the junk tars.

            It turns out that schema and config files are quite different from the typical dataset files. As they are implemented in the w_2017_33 stack, schema/config are not (and can not unless stack change) be assosiated with individual data IDs. This seems because the implementation of CmdLineTask assumes there is only one config https://github.com/lsst/pipe_base/blob/w.2017.33/python/lsst/pipe/base/cmdLineTask.py#L604
            butler.put() is called without a dataId. While usually a butler get/put required a dataID or get/put is done via a dataRef (e.g. https://github.com/lsst/pipe_tasks/blob/w.2017.33/python/lsst/pipe/tasks/multiBand.py#L1097)

            So I was not able to add dataid to its butler filename template for schema/config. After discussions Michelle Gower, I'll proceed with ingesting full config files into DBB and take them as inputs, and let tasks do the usual config checks.

            Show
            hchiang2 Hsin-Fang Chiang added a comment - So far all boost files and other typical dataset files have been added to wcl so they are tracked and no longer showing up in the junk tars. It turns out that schema and config files are quite different from the typical dataset files. As they are implemented in the w_2017_33 stack, schema/config are not (and can not unless stack change) be assosiated with individual data IDs. This seems because the implementation of CmdLineTask assumes there is only one config https://github.com/lsst/pipe_base/blob/w.2017.33/python/lsst/pipe/base/cmdLineTask.py#L604 butler.put() is called without a dataId. While usually a butler get/put required a dataID or get/put is done via a dataRef (e.g. https://github.com/lsst/pipe_tasks/blob/w.2017.33/python/lsst/pipe/tasks/multiBand.py#L1097 ) So I was not able to add dataid to its butler filename template for schema/config. After discussions Michelle Gower , I'll proceed with ingesting full config files into DBB and take them as inputs, and let tasks do the usual config checks.
            Hide
            hchiang2 Hsin-Fang Chiang added a comment -

            Michelle Gower may you please review this? I've uploaded the wcl file and the most up-to-date butler template yaml to a new GitHub repo at https://github.com/lsst-dm/prod_wcl
            The pull request for this ticket is here: https://github.com/lsst-dm/prod_wcl/pull/1

            With this updated wcl and database/files in dbb-beta, I was able to run it, the run was able to finish successfully, and the only files left in the junk tarballs are:

            In the "processccd" block:
            jobrepo/calibRegistry.sqlite3
            jobrepo/repositoryCfg.yaml
            jobrepo/registry.sqlite3
            jobrepo/STRIPE82L/2013-11-02/00671/HSC-I/HSC-0903986-016.fits and the raw input file of the job

            In the "drp-patch" block:
            jobrepo/repositoryCfg.yaml
            jobrepo/registry.sqlite3

            repositoryCfg.yaml and sqlite3 files are names hardcoded in the stack and required with the current stack implementations afaik. As for the raw files I don't see a straightforward way to not rename them. More notes are detailed with each git commits.

            Show
            hchiang2 Hsin-Fang Chiang added a comment - Michelle Gower may you please review this? I've uploaded the wcl file and the most up-to-date butler template yaml to a new GitHub repo at https://github.com/lsst-dm/prod_wcl The pull request for this ticket is here: https://github.com/lsst-dm/prod_wcl/pull/1 With this updated wcl and database/files in dbb-beta, I was able to run it, the run was able to finish successfully, and the only files left in the junk tarballs are: In the "processccd" block: jobrepo/calibRegistry.sqlite3 jobrepo/repositoryCfg.yaml jobrepo/registry.sqlite3 jobrepo/STRIPE82L/2013-11-02/00671/HSC-I/HSC-0903986-016.fits and the raw input file of the job In the "drp-patch" block : jobrepo/repositoryCfg.yaml jobrepo/registry.sqlite3 repositoryCfg.yaml and sqlite3 files are names hardcoded in the stack and required with the current stack implementations afaik. As for the raw files I don't see a straightforward way to not rename them. More notes are detailed with each git commits.
            Hide
            mgower Michelle Gower added a comment -

            Thanks for putting the wcl in git.

            Show
            mgower Michelle Gower added a comment - Thanks for putting the wcl in git.
            Hide
            hchiang2 Hsin-Fang Chiang added a comment -

            Thanks!

            Merged.

            Show
            hchiang2 Hsin-Fang Chiang added a comment - Thanks! Merged.

              People

              • Assignee:
                hchiang2 Hsin-Fang Chiang
                Reporter:
                hchiang2 Hsin-Fang Chiang
                Reviewers:
                Michelle Gower
                Watchers:
                Hsin-Fang Chiang, Michelle Gower
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel