Details
-
Type:
Story
-
Status: To Do
-
Resolution: Unresolved
-
Fix Version/s: None
-
Component/s: pipe_base
-
Labels:
-
Team:Data Release Production
-
Urgent?:No
Description
The QuantumGraph butler-export logic developed for the execution butler system has a subtle flaw: it exports data IDs (and hence DimensionRecords) for all datasets, but that can miss some "relationship" DimensionElements like "visit_definition", which only appear in data IDs that have both of the dimensions they relate ("exposure" and "visit").
To fix this, it should be sufficient to extend each dataset's data ID with any key-value pairs also present in its quantum's data ID, and then export that. Something like this:
full = registry.expandDataId(dataset.dataId, **quantum.dataId.byName())
|
Unfortunately that will be super slow to run on all of the quanta in a big graph. We probably ought to think about ways to save these records in the QG itself to make it more self-suffiicent; we get them (moderately efficiently, in bulk) at QG generation, and then throw them way. It's either that or wait until we can get bulk data ID expansion working on DM-30438.
Attachments
Issue Links
- relates to
-
DM-30438 Add support for uploading data IDs to temporary tables and vectorize data ID expansion
- To Do