Details
-
Type:
Story
-
Status: Done
-
Resolution: Done
-
Fix Version/s: None
-
Component/s: daf_butler
-
Labels:
-
Story Points:35
-
Epic Link:
-
Team:Data Release Production
Description
The dimension package introduced in DM-15034 and modified in DM-15675 are a big improvement on what we had before, but the logic for constructing and expanding DataIds in particular is getting quite complex and hard to follow. Issues include:
- Use of __new__ for DataId to avoid unnecessary copies
- (Intentional) pointer aliasing in DataId.entries.
- Upgrading and downgrading of implied-dependency links from DataId.entries to the main dictionary.
- Lots of code to traverse DimensionGraphs in different ways in expandDataId, SqlPreFlight, and DimensionGraph itself, none of which can currently be expressed as an iterator.
- Nomenclature is still confusing: because a Dimension instance represents a table, we're stuck with "entry" as a row in that table, and "link" as its primary key.
These work well and solve real problems, but they also make it hard to reason about what's going on.
There are also some problems on the horizon that we need to solve; any major refactor at this stage should at least consider these as well:
- The ExposureRange Dimension is totally special-cased in preflight right now, because we need different entries for different DatasetTypes within the same Quantum. But this will also be true of other Dimensions in future PipelineTasks (especially if we ever unify Exposure and Visit into Observation).
- We need to add a level of indirection between data ID keys and Dimension field names, to permit different Instruments to have different preferred observation IDs associated with different database fields.
- We need to add support for per-Instrument Dimension metadata tables.
- We'd like to remove the need for PipelineTasks to deal with both uppercase Dimension names and lower-case link names, since there's almost a one-to-one mapping between them (and there will be once ExposureRange is turned into a join for CalibIdentifier).
This is not a high priority, as we don't need any of this to get the current set of PipelineTasks working (though it may let us avoid hacks in CPP PipelineTasks).
Attachments
Issue Links
- blocks
-
DM-21231 Refactor Registry handling of dataset and associated tables
- Done
- contains
-
DM-19888 Reduce view usage in QuantumGraph queries
- Invalid
-
DM-21093 Replace fragile raw SQL inserts in ci_hsc_gen3
- Invalid
-
DM-21125 Gen3 ingest tests attempt to write into testdata package directories
- Invalid
-
DM-13990 Add support for different SkyPix systems to Registry DataUnit schema
- Invalid
-
DM-15411 Move gen2convert subpackage out of daf_butler
- Invalid
- is blocked by
-
DM-21410 Turn ci_hsc into a metapackage for ci_hsc_gen2 and ci_hsc_gen3
- Done
-
DM-16539 Add level of indirection betwen calib identifier and exposure range in schema
- Done
- relates to
-
DM-21420 ap_verify datasets have out-of date refcat configs
- Done
-
DM-20054 Normalize dimensions in DatasetType and config mappings up-front
- Done
-
DM-17025 Improve filename template mechanisms in PosixDatastore and Butler
- Done
-
DM-15034 Custom classes for DataUnit tuples/sets and Data IDs
- Done
-
DM-15675 Make sure data IDs are expanded when adding Datasets and filling templates
- Done
-
DM-17663 Make Registry table names lowercase
- Done
-
DM-19851 Improve multi-collection query in QuantumGraph generator
- Invalid
-
DM-17154 Move Registry schema definition from YAML to Python
- Invalid
-
DM-18892 Investigate deletes in Gen3 ci_hsc conversion or processing
- Invalid
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
The megaticket is almost passing Jenkins and is ready for final review. Once again I'm trying to split it up between Tim Jenness and Andy Salnikov, and since there are a lot of packages but also a lot of code that has already been reviewed, I'll point out the commits that are new and who I'd like to look at them:
It's also possible there will be an obs_lsst branch added later, if Tim lands Gen3 support there before I can break his branch
.