Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-22771

Resurrect HSC RC2 Gen3 repo bootstrap

    XMLWordPrintable

    Details

      Description

      The refactoring of the Gen2 conversion tools on DM-17023 broke the scripts we had previously put together for creating a Gen3 data repository containing the HSC RC2 dataset. At the time this wasn't considered worth fixing, because we still hadn't addressed the issues that made RC2 ingest and preflight painfully slow. That has been partially done (i.e. ingest should now be faster) on DM-21768, so it's time to get that working again.

      This ticket is for getting the new Gen2 conversion tools working on RC2. After that's done, we may want to also make some (hopefully minor) improvements to the yaml import/export mechanism to provide a faster way of re-creating the Gen3 repo from scratch, given that a lot of the slowness in Gen2 conversion is due to limitations in how fast we can discover related datasets in the Gen2 repo.

        Attachments

          Issue Links

            Activity

            No builds found.
            jbosch Jim Bosch created issue -
            jbosch Jim Bosch made changes -
            Field Original Value New Value
            Epic Link DM-22586 [ 427653 ]
            jbosch Jim Bosch made changes -
            Link This issue is blocked by DM-21768 [ DM-21768 ]
            jbosch Jim Bosch made changes -
            Status To Do [ 10001 ] In Progress [ 3 ]
            Hide
            jbosch Jim Bosch added a comment -

            Hsin-Fang Chiang, do you have time to take this review? The purpose of it all is to get Gen3 HSC RC2 bootstrap working again, but the only big changes are to the gen2to3 logic in obs_base to make it faster (by not descending into directories that don't contain anything we want). Changes in all other packages are simple, and I'd recommend looking at all of them commit-by-commit to get the clearest picture of the changes.

            Since Jira hasn't found them all, the PRs are:

            Show
            jbosch Jim Bosch added a comment - Hsin-Fang Chiang , do you have time to take this review? The purpose of it all is to get Gen3 HSC RC2 bootstrap working again, but the only big changes are to the gen2to3 logic in obs_base to make it faster (by not descending into directories that don't contain anything we want). Changes in all other packages are simple, and I'd recommend looking at all of them commit-by-commit to get the clearest picture of the changes. Since Jira hasn't found them all, the PRs are: https://github.com/lsst/daf_butler/pull/219 https://github.com/lsst/obs_base/pull/197 https://github.com/lsst/obs_subaru/pull/241 https://github.com/lsst-dm/gen3-hsc-rc2/pull/2
            jbosch Jim Bosch made changes -
            Reviewers Hsin-Fang Chiang [ hchiang2 ]
            Status In Progress [ 3 ] In Review [ 10004 ]
            Hide
            hchiang2 Hsin-Fang Chiang added a comment -

            Generally it looks okay to me but TBH I'm having difficulties understanding some repo walker code; I roughly grasp the idea there and handling the kinds of folder/files in the filesystem but can't say I really understand. I was wondering how detailed you would like me to review, especially if I understand correctly this code will go away once we transition to a pure Gen3 world.  On the other hand, if this code will live longer than that, some unit tests would be nice. 

            Show
            hchiang2 Hsin-Fang Chiang added a comment - Generally it looks okay to me but TBH I'm having difficulties understanding some repo walker code; I roughly grasp the idea there and handling the kinds of folder/files in the filesystem but can't say I really understand. I was wondering how detailed you would like me to review, especially if I understand correctly this code will go away once we transition to a pure Gen3 world.  On the other hand, if this code will live longer than that, some unit tests would be nice. 
            Hide
            jbosch Jim Bosch added a comment -

            Thanks for the quick review. This code will indeed go away with Gen2, so I don't think a super-detailed review is necessary. However, if there's anything I can do now to make this code more understandable - perhaps primarily for someone trying to debug it, rather than someone trying to extend it - via comments, documentation, logging, etc., that would be good to address.

            Show
            jbosch Jim Bosch added a comment - Thanks for the quick review. This code will indeed go away with Gen2, so I don't think a super-detailed review is necessary. However, if there's anything I can do now to make this code more understandable - perhaps primarily for someone trying to debug it, rather than someone trying to extend it - via comments, documentation, logging, etc., that would be good to address.
            Hide
            jbosch Jim Bosch added a comment -

            I've just created a PR for ci_hsc_gen2, to override the configuration added to obs_subaru as suggested in the reivew.  I also added another commit to obs base to deal with the fact the Gen2 has some dataset types (config and metadata ones) that use the same templates.

            Show
            jbosch Jim Bosch added a comment - I've just created a PR for ci_hsc_gen2 , to override the configuration added to obs_subaru as suggested in the reivew.  I also added another commit to obs base to deal with the fact the Gen2 has some dataset types (config and metadata ones) that use the same templates.
            Hide
            jbosch Jim Bosch added a comment -

            Hsin-Fang Chiang, did you want to take another look at this, or should I merge if it gets through Jenkins?

            Show
            jbosch Jim Bosch added a comment - Hsin-Fang Chiang , did you want to take another look at this, or should I merge if it gets through Jenkins?
            Hide
            hchiang2 Hsin-Fang Chiang added a comment -

            Okay to merge given that this is for the short-term practical need; otherwise it may need a more critical review. I also didn't try to run anything but I gathered that you did already. 

            Show
            hchiang2 Hsin-Fang Chiang added a comment - Okay to merge given that this is for the short-term practical need; otherwise it may need a more critical review. I also didn't try to run anything but I gathered that you did already. 
            hchiang2 Hsin-Fang Chiang made changes -
            Status In Review [ 10004 ] Reviewed [ 10101 ]
            jbosch Jim Bosch made changes -
            Resolution Done [ 10000 ]
            Status Reviewed [ 10101 ] Done [ 10002 ]

              People

              Assignee:
              jbosch Jim Bosch
              Reporter:
              jbosch Jim Bosch
              Reviewers:
              Hsin-Fang Chiang
              Watchers:
              Hsin-Fang Chiang, Jim Bosch
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.