The dimension package introduced in
DM-15034 and modified in DM-15675 are a big improvement on what we had before, but the logic for constructing and expanding DataIds in particular is getting quite complex and hard to follow. Issues include:
- Use of __new__ for DataId to avoid unnecessary copies
- (Intentional) pointer aliasing in DataId.entries.
- Upgrading and downgrading of implied-dependency links from DataId.entries to the main dictionary.
- Lots of code to traverse DimensionGraphs in different ways in expandDataId, SqlPreFlight, and DimensionGraph itself, none of which can currently be expressed as an iterator.
- Nomenclature is still confusing: because a Dimension instance represents a table, we're stuck with "entry" as a row in that table, and "link" as its primary key.
These work well and solve real problems, but they also make it hard to reason about what's going on.
There are also some problems on the horizon that we need to solve; any major refactor at this stage should at least consider these as well:
- The ExposureRange Dimension is totally special-cased in preflight right now, because we need different entries for different DatasetTypes within the same Quantum. But this will also be true of other Dimensions in future PipelineTasks (especially if we ever unify Exposure and Visit into Observation).
- We need to add a level of indirection between data ID keys and Dimension field names, to permit different Instruments to have different preferred observation IDs associated with different database fields.
- We need to add support for per-Instrument Dimension metadata tables.
- We'd like to remove the need for PipelineTasks to deal with both uppercase Dimension names and lower-case link names, since there's almost a one-to-one mapping between them (and there will be once ExposureRange is turned into a join for CalibIdentifier).
This is not a high priority, as we don't need any of this to get the current set of PipelineTasks working (though it may let us avoid hacks in CPP PipelineTasks).