Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-19638

Create parent task/script for bootstrapping Gen3 repos

    Details

      Description

      Building on recent work to enable bootstrapping the Gen3 ci_hsc dataset with minimal reliance on gen2convert (DM-19614, DM-19615, DM-19622, DM-19531, DM-19272) and improvements to gen2convert itself (DM-18023), write a task/script that can be used to similarly bootstrap other Gen3 repos, starting with RC2.

      Steps will include:

      • register instrument(s)
      • add curated calibrations, via Instrument.writeCuratedCalibrations
      • register skymaps(s)
      • ingest raws, via RawIngestTask
      • add Gen2-calib-repo calibrations, refcats, and BrightObjectMasks via explicit, lower-level calls to gen2convert code in daf_butler.

      I do not currently plan to include actually creating the empty repository, which will save this task from having to somehow read in and apply overrides to butler yaml configuration.

        Attachments

          Issue Links

            Activity

            jbosch Jim Bosch created issue -
            jbosch Jim Bosch made changes -
            Field Original Value New Value
            Epic Link DM-16771 [ 235477 ]
            jbosch Jim Bosch made changes -
            Description Building on recent work to enable bootstrapping the Gen3 ci_hsc dataset with minimal reliance on gen2convert (DM-19614, DM-19615, DM-19622, DM-19531, DM-19272) and improvements to gen2convert itself (DM-18023), write a task/script that can be used to similarly bootstrap other Gen3 repos, starting with RC2.

            Steps will include:
             * register instrument(s)
             * add curated calibrations, via Instrument.writeCuratedCalibrations
             * register skymaps(s) - via the new MakeGen3SkyMapTask pf DM-19531
             * ingest raws, via RawIngestTask
             * add Gen2-calib-repo calibrations, refcats, and BrightObjectMasks via explicit, lower-level calls to gen2convert code in daf_butler.

            I do not currently plan to include actually creating the empty repository, which will save this task from having to somehow read in and apply overrides to butler yaml configuration.
            Building on recent work to enable bootstrapping the Gen3 ci_hsc dataset with minimal reliance on gen2convert (DM-19614, DM-19615, DM-19622, DM-19531, DM-19272) and improvements to gen2convert itself (DM-18023), write a task/script that can be used to similarly bootstrap other Gen3 repos, starting with RC2.

            Steps will include:
             * register instrument(s)
             * add curated calibrations, via Instrument.writeCuratedCalibrations
             * register skymaps(s)
             * ingest raws, via RawIngestTask
             * add Gen2-calib-repo calibrations, refcats, and BrightObjectMasks via explicit, lower-level calls to gen2convert code in daf_butler.

            I do not currently plan to include actually creating the empty repository, which will save this task from having to somehow read in and apply overrides to butler yaml configuration.
            jbosch Jim Bosch made changes -
            Status To Do [ 10001 ] In Progress [ 3 ]
            Hide
            jbosch Jim Bosch added a comment -

            I'm hoping to get this into the weekly on Friday, so I figured I'd send this to you now even though I'm still testing my cleanups to obs_base (where most of the code is) before putting them on the tickets branch.  So the PRs that are ready now are:

            The last is in lsst-dm, so it doesn't formally need a review, but I figure it's always good to get two eyes on any new code.

            I'll post the obs_base PR as soon as that testing completes and I've moved my cleaned-up version from u/jbosch/ to tickets/.

            Show
            jbosch Jim Bosch added a comment - I'm hoping to get this into the weekly on Friday, so I figured I'd send this to you now even though I'm still testing my cleanups to obs_base (where most of the code is) before putting them on the tickets branch.  So the PRs that are ready now are: daf_butler: https://github.com/lsst/daf_butler/pull/158 obs_subaru: https://github.com/lsst/obs_subaru/pull/201 the new gen3-hsc-rc2 repo: https://github.com/lsst-dm/gen3-hsc-rc2/pull/1 The last is in lsst-dm, so it doesn't formally need a review, but I figure it's always good to get two eyes on any new code. I'll post the obs_base PR as soon as that testing completes and I've moved my cleaned-up version from u/jbosch/ to tickets/.
            jbosch Jim Bosch made changes -
            Reviewers Hsin-Fang Chiang [ hchiang2 ]
            Status In Progress [ 3 ] In Review [ 10004 ]
            Hide
            jbosch Jim Bosch added a comment -

            Testing succeeded, and the obs_base PR is up: https://github.com/lsst/obs_base/pull/150.

            Show
            jbosch Jim Bosch added a comment - Testing succeeded, and the obs_base PR is up: https://github.com/lsst/obs_base/pull/150 .
            hchiang2 Hsin-Fang Chiang made changes -
            Link This issue relates to DM-19797 [ DM-19797 ]
            Hide
            hchiang2 Hsin-Fang Chiang added a comment -

            Not sure I understand daf_butler & obs_base enough to say much about them, but in general looks okay so I'm marking it Reviewed. I'll take another look tomorrow. Not sure if you'd want somebody with better understanding of the design to take a look too.

            Show
            hchiang2 Hsin-Fang Chiang added a comment - Not sure I understand daf_butler & obs_base enough to say much about them, but in general looks okay so I'm marking it Reviewed. I'll take another look tomorrow. Not sure if you'd want somebody with better understanding of the design to take a look too.
            hchiang2 Hsin-Fang Chiang made changes -
            Status In Review [ 10004 ] Reviewed [ 10101 ]
            Hide
            hchiang2 Hsin-Fang Chiang added a comment -

            Do you think anything in this ticket could possibly change the behavior in multiprocessing? When I test-ran pipetask with the ticket branches and -j 12 I got

              File "/software/lsstsw/stack_20190330/stack/miniconda3-4.5.12-1172c30/Linux64/pipe_base/17.0.1-2-g3e5d191+31/python/lsst/pipe/base/pipelineTask.py", line 579, in runQuantum
                self.saveStruct(struct, outputDataRefs, butler)
              File "/software/lsstsw/stack_20190330/stack/miniconda3-4.5.12-1172c30/Linux64/pipe_base/17.0.1-2-g3e5d191+31/python/lsst/pipe/base/pipelineTask.py", line 613, in saveStruct
                butler.put(data, dataRef.datasetType.name, dataRef.dataId)
              File "/home/hchiang2/stack/DM19638/daf_butler/python/lsst/daf/butler/core/utils.py", line 181, in inner
                return func(self, *args, **kwargs)
              File "/home/hchiang2/stack/DM19638/daf_butler/python/lsst/daf/butler/butler.py", line 365, in put
                raise TypeError("Butler is read-only.")
            TypeError: Butler is read-only.
            

            while running the same repo seemed to work fine with w_2019_19. However it doesn't look relevant from the code changes and I haven't tested carefully, so please just ignore this if it doesn't ring a bell.

            Show
            hchiang2 Hsin-Fang Chiang added a comment - Do you think anything in this ticket could possibly change the behavior in multiprocessing? When I test-ran pipetask with the ticket branches and -j 12 I got File "/software/lsstsw/stack_20190330/stack/miniconda3-4.5.12-1172c30/Linux64/pipe_base/17.0.1-2-g3e5d191+31/python/lsst/pipe/base/pipelineTask.py", line 579, in runQuantum self.saveStruct(struct, outputDataRefs, butler) File "/software/lsstsw/stack_20190330/stack/miniconda3-4.5.12-1172c30/Linux64/pipe_base/17.0.1-2-g3e5d191+31/python/lsst/pipe/base/pipelineTask.py", line 613, in saveStruct butler.put(data, dataRef.datasetType.name, dataRef.dataId) File "/home/hchiang2/stack/DM19638/daf_butler/python/lsst/daf/butler/core/utils.py", line 181, in inner return func(self, *args, **kwargs) File "/home/hchiang2/stack/DM19638/daf_butler/python/lsst/daf/butler/butler.py", line 365, in put raise TypeError("Butler is read-only.") TypeError: Butler is read-only. while running the same repo seemed to work fine with w_2019_19. However it doesn't look relevant from the code changes and I haven't tested carefully, so please just ignore this if it doesn't ring a bell.
            Hide
            jbosch Jim Bosch added a comment -

            Thanks for the quick review.  The obs_base changes aren't really a part of any larger design, so I don't think there necessarily is anyone better.  There might be some changes in daf_butler that Tim Jenness might be interested in, but as he's traveling I'll just leave this ping here but not hold up the ticket on him taking a look.

             

            I'm afraid I don't think this ticket could be in play for the multiprocessing problem you're seeing - if it is, it'd have to be a very strange interaction.

            Show
            jbosch Jim Bosch added a comment - Thanks for the quick review.  The obs_base changes aren't really a part of any larger design, so I don't think there necessarily is anyone better.  There might be some changes in daf_butler that Tim Jenness might be interested in, but as he's traveling I'll just leave this ping here but not hold up the ticket on him taking a look.   I'm afraid I don't think this ticket could be in play for the multiprocessing problem you're seeing - if it is, it'd have to be a very strange interaction.
            Hide
            jbosch Jim Bosch added a comment -

            I've created an lsst.log PR for the suggestion to move the log-level context manager to that package: https://github.com/lsst/log/pull/41.

            Jenkins is running: https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/29843/pipeline

             

            Show
            jbosch Jim Bosch added a comment - I've created an lsst.log PR for the suggestion to move the log-level context manager to that package: https://github.com/lsst/log/pull/41. Jenkins is running: https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/29843/pipeline  
            jbosch Jim Bosch made changes -
            Resolution Done [ 10000 ]
            Status Reviewed [ 10101 ] Done [ 10002 ]
            Hide
            jbosch Jim Bosch added a comment -

            Hsin-Fang Chiang, sorry about my skepticism about your error report above being related to this ticket - it absolutely was, but it took breaking the weekly and some debugging by Nate Lust for me to figure it out.

            Show
            jbosch Jim Bosch added a comment - Hsin-Fang Chiang , sorry about my skepticism about your error report above being related to this ticket - it absolutely was, but it took breaking the weekly and some debugging by Nate Lust for me to figure it out.
            czw Christopher Waters made changes -
            Link This issue contains DM-18027 [ DM-18027 ]
            jbosch Jim Bosch made changes -
            Link This issue relates to DM-19961 [ DM-19961 ]

              People

              • Assignee:
                jbosch Jim Bosch
                Reporter:
                jbosch Jim Bosch
                Reviewers:
                Hsin-Fang Chiang
                Watchers:
                Hsin-Fang Chiang, Jim Bosch
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel