Details
-
Type:
Story
-
Status: Done
-
Resolution: Done
-
Fix Version/s: None
-
Component/s: obs_base, obs_subaru
-
Story Points:4
-
Epic Link:
-
Team:Data Release Production
Description
The refactoring of the Gen2 conversion tools on DM-17023 broke the scripts we had previously put together for creating a Gen3 data repository containing the HSC RC2 dataset. At the time this wasn't considered worth fixing, because we still hadn't addressed the issues that made RC2 ingest and preflight painfully slow. That has been partially done (i.e. ingest should now be faster) on DM-21768, so it's time to get that working again.
This ticket is for getting the new Gen2 conversion tools working on RC2. After that's done, we may want to also make some (hopefully minor) improvements to the yaml import/export mechanism to provide a faster way of re-creating the Gen3 repo from scratch, given that a lot of the slowness in Gen2 conversion is due to limitations in how fast we can discover related datasets in the Gen2 repo.
Attachments
Issue Links
- is blocked by
-
DM-21768 Vectorize dataset insert API
- Done
Hsin-Fang Chiang, do you have time to take this review? The purpose of it all is to get Gen3 HSC RC2 bootstrap working again, but the only big changes are to the gen2to3 logic in obs_base to make it faster (by not descending into directories that don't contain anything we want). Changes in all other packages are simple, and I'd recommend looking at all of them commit-by-commit to get the clearest picture of the changes.
Since Jira hasn't found them all, the PRs are: