Fix Version/s: None
(original description is no longer accurate; see comments)
Registry's provenance tables - execution, run, and quantum - can't currently be fully populated without having database updates. For example: you can't insert a dataset until after its run has been inserted; you can't insert a run until you've inserted the execution it inherits its ID from; you can't insert an execution until after it's completed, because it has an end timestamp field. Work through low-level use cases for these tables and ensure we can actually have all of their values when it's time to insert them, splitting up tables as necessary.
, Christopher Stephens [X] : this ticket exists because I'm assuming updates are a problem. Please let me know if they aren't or if you have any expections/requirements/wisdom on when various provenance records should be inserted relative to the datasets they refer to.
- is contained by
DM-21231 Refactor Registry handling of dataset and associated tables
I'm repurposing this ticket (slightly) to bring the handling of collections, runs, and quanta in line with the new prototype. That will involve:
- Removing the Run class, and always using strings to refer to runs in public APIs.
- Removing the Execution class, and moving its attributes into Run and Quantum.
- Inventing the high-level Registry APIs for working with these entities (the mid-level prototype APIs inform these, but do not specify them), and replacing the existing Registry APIs with them.
- Adjusting the Quantum class to work with the new Registry APIs and the mid-level prototype APIs.
This will involve some minor changes to the schema, but the big changes will be reserved until the Registry backend out of the way. For now we'll probably just let the string name of a run be its primary key, and add a surrogate integer ID later. The focus for this ticket is on getting the primitives in shape so that we can develop the new Registry against them without breaking the existing one, and to a lesser extent getting the public Registry APIs closer to their final form.
Mikolaj Kowalik, do you mind taking this review? I'm trying to spread daf_butler reviews around more, and this one both doesn't require much previous knowledge of the codebase and it's a part that's relevant for interfacing with workflow-management code, so it seemed like a good one for you.
Changes are in four packages:
Only the daf_butler changes are nontrivial, and even most of those are quite mechanical.
It looks to me like any updates would be to logging type tables (mainly execution.end_time). i don't see any issues with allowing this.