Details
-
Type:
Story
-
Status: Done
-
Resolution: Done
-
Fix Version/s: None
-
Component/s: None
-
Labels:
-
Story Points:8
-
Epic Link:
-
Sprint:BG3_F18_09, BG3_F18_10
-
Team:Data Access and Database
Description
Changes in gen3 butler need few updates for execution framework. I think that piece responsible for running the task should still work (but it has not been tested recently), but post-pre-flight part needs to do some additional work:
- make sure that output collection exists in registry
- copy/associate (if needed) all input datasets into output collection, this probably needs to be done recursively
- provenance information (Quanta) needs to be saved in registry
I probably need more realistic examples of PipelineTask to properly test it.
Kian-Tat Lim, I was trying to imagine a way to make it work in current schema. I agree that extending schema with "canonical id" would work, but there are non-trivial issues with that:
One implementation of canonical id that I can imagine is just a string representation of a DatasetRef (DatasetType and DataId part of it, e.g. Patch(patch=42,skymap=MySkyMap,tract=100)). This should be done very carefully to avoid ambiguities and keep it compatible w.r.t. potential schema changes (which is hard when you cannot predict the future).
Still, I agree with one thing - we need table-level constraint check for this, otherwise things will get very ugly. I think implementing that kind of thing is beyond the scope of this ticket, what I want to do here is to make some trivial check that works in a single-user environment, basically more or less the same thing that we have today in addDataset() but try to make it in a more efficient way.