Fix Version/s: None
Component/s: obs_base, obs_subaru
The refactoring of the Gen2 conversion tools on
DM-17023 broke the scripts we had previously put together for creating a Gen3 data repository containing the HSC RC2 dataset. At the time this wasn't considered worth fixing, because we still hadn't addressed the issues that made RC2 ingest and preflight painfully slow. That has been partially done (i.e. ingest should now be faster) on DM-21768, so it's time to get that working again.
This ticket is for getting the new Gen2 conversion tools working on RC2. After that's done, we may want to also make some (hopefully minor) improvements to the yaml import/export mechanism to provide a faster way of re-creating the Gen3 repo from scratch, given that a lot of the slowness in Gen2 conversion is due to limitations in how fast we can discover related datasets in the Gen2 repo.
- is blocked by
DM-21768 Vectorize dataset insert API
|Status||To Do [ 10001 ]||In Progress [ 3 ]|
|Reviewers||Hsin-Fang Chiang [ hchiang2 ]|
|Status||In Progress [ 3 ]||In Review [ 10004 ]|
Generally it looks okay to me but TBH I'm having difficulties understanding some repo walker code; I roughly grasp the idea there and handling the kinds of folder/files in the filesystem but can't say I really understand. I was wondering how detailed you would like me to review, especially if I understand correctly this code will go away once we transition to a pure Gen3 world. On the other hand, if this code will live longer than that, some unit tests would be nice.
Thanks for the quick review. This code will indeed go away with Gen2, so I don't think a super-detailed review is necessary. However, if there's anything I can do now to make this code more understandable - perhaps primarily for someone trying to debug it, rather than someone trying to extend it - via comments, documentation, logging, etc., that would be good to address.
I've just created a PR for ci_hsc_gen2, to override the configuration added to obs_subaru as suggested in the reivew. I also added another commit to obs base to deal with the fact the Gen2 has some dataset types (config and metadata ones) that use the same templates.
Hsin-Fang Chiang, did you want to take another look at this, or should I merge if it gets through Jenkins?
Okay to merge given that this is for the short-term practical need; otherwise it may need a more critical review. I also didn't try to run anything but I gathered that you did already.
|Status||In Review [ 10004 ]||Reviewed [ 10101 ]|
|Resolution||Done [ 10000 ]|
|Status||Reviewed [ 10101 ]||Done [ 10002 ]|
Hsin-Fang Chiang, do you have time to take this review? The purpose of it all is to get Gen3 HSC RC2 bootstrap working again, but the only big changes are to the gen2to3 logic in obs_base to make it faster (by not descending into directories that don't contain anything we want). Changes in all other packages are simple, and I'd recommend looking at all of them commit-by-commit to get the clearest picture of the changes.
Since Jira hasn't found them all, the PRs are: