Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-17023

Refactor the Dimensions and query system

    XMLWordPrintable

    Details

      Description

      The dimension package introduced in DM-15034 and modified in DM-15675 are a big improvement on what we had before, but the logic for constructing and expanding DataIds in particular is getting quite complex and hard to follow.  Issues include:

      • Use of __new__ for DataId to avoid unnecessary copies
      • (Intentional) pointer aliasing in DataId.entries.
      • Upgrading and downgrading of implied-dependency links from DataId.entries to the main dictionary.
      • Lots of code to traverse DimensionGraphs in different ways in expandDataId, SqlPreFlight, and DimensionGraph itself, none of which can currently be expressed as an iterator.
      • Nomenclature is still confusing: because a Dimension instance represents a table, we're stuck with "entry" as a row in that table, and "link" as its primary key.

      These work well and solve real problems, but they also make it hard to reason about what's going on.

      There are also some problems on the horizon that we need to solve; any major refactor at this stage should at least consider these as well:

      • The ExposureRange Dimension is totally special-cased in preflight right now, because we need different entries for different DatasetTypes within the same Quantum. But this will also be true of other Dimensions in future PipelineTasks (especially if we ever unify Exposure and Visit into Observation).
      • We need to add a level of indirection between data ID keys and Dimension field names, to permit different Instruments to have different preferred observation IDs associated with different database fields.
      • We need to add support for per-Instrument Dimension metadata tables.
      • We'd like to remove the need for PipelineTasks to deal with both uppercase Dimension names and lower-case link names, since there's almost a one-to-one mapping between them (and there will be once ExposureRange is turned into a join for CalibIdentifier).

      This is not a high priority, as we don't need any of this to get the current set of PipelineTasks working (though it may let us avoid hacks in CPP PipelineTasks).

        Attachments

          Issue Links

            Activity

            Hide
            jbosch Jim Bosch added a comment -

            Some ideas to try:

            • Re-frame DimensionGraph with Dimensions as nodes and DimensionJoins as a special kind of edge; add traversal iterators of various types.
            • Make DataId standardization go through a factory function instead of the constructor to avoid the need for _new_. Might need to make this a registry method to provide automatic expansion and that extra level of indirection discussed above; are there contexts in which we need to standardize a DataId but don't have a Registry?
            • DataId (and perhaps DimensionGraph) should more explicitly track which implied dimension links are known. Or perhaps (if standardization always goes through a Registry) all implied dimension links should always be known?
            Show
            jbosch Jim Bosch added a comment - Some ideas to try: Re-frame DimensionGraph with Dimensions as nodes and DimensionJoins as a special kind of edge; add traversal iterators of various types. Make DataId standardization go through a factory function instead of the constructor to avoid the need for _ new _. Might need to make this a registry method to provide automatic expansion and that extra level of indirection discussed above; are there contexts in which we need to standardize a DataId but don't have a Registry? DataId (and perhaps DimensionGraph) should more explicitly track which implied dimension links are known. Or perhaps (if standardization always goes through a Registry) all implied dimension links should always be known?
            Hide
            jbosch Jim Bosch added a comment -

            Should also address here a comment from Nate Lust on DM-17048:

            Not a problem for this ticket, but it would be nice to have a ticket for someone to go back and add a lot of comment lines describing what is happening and why in conceptual terms, as it is hard to do anything but see the code right in front of you with no understanding when someone opens this package.

            I believe this most pertains to selectDimensions, but it's good advice more broadly as well.

            Show
            jbosch Jim Bosch added a comment - Should also address here a comment from Nate Lust on DM-17048 : Not a problem for this ticket, but it would be nice to have a ticket for someone to go back and add a lot of comment lines describing what is happening and why in conceptual terms, as it is hard to do anything but see the code right in front of you with no understanding when someone opens this package. I believe this most pertains to selectDimensions , but it's good advice more broadly as well.
            Hide
            jbosch Jim Bosch added a comment -

            Right now ExposureRange is the only Dimension with two keys, but we'd want to fix that (DM-16539) even if we didn't want to eliminate the distinction between dimension name and link name.  Marking that as a blocker.

            Show
            jbosch Jim Bosch added a comment - Right now ExposureRange is the only Dimension with two keys, but we'd want to fix that ( DM-16539 ) even if we didn't want to eliminate the distinction between dimension name and link name.  Marking that as a blocker.
            Hide
            jbosch Jim Bosch added a comment -

            This ticket is still a major refactor of the dimensions system, and many of the original motivating factors are still very much motivators, but the "ideas to try" are pretty out of date, and some of the motivating factors have already been addressed on other tickets.  A more complete writeup of what will be changing is forthcoming - I'll be putting as much of that makes sense in as actual code documentation that describes the new system (not so much as a diff against the old one).  In the meantime, take the original description and previous comments here with a grain of salt.

            Show
            jbosch Jim Bosch added a comment - This ticket is still a major refactor of the dimensions system, and many of the original motivating factors are still very much motivators, but the "ideas to try" are pretty out of date, and some of the motivating factors have already been addressed on other tickets.  A more complete writeup of what will be changing is forthcoming - I'll be putting as much of that makes sense in as actual code documentation that describes the new system (not so much as a diff against the old one).  In the meantime, take the original description and previous comments here with a grain of salt.
            Hide
            jbosch Jim Bosch added a comment -

            The megaticket is almost passing Jenkins and is ready for final review.  Once again I'm trying to split it up between Tim Jenness and Andy Salnikov, and since there are a lot of packages but also a lot of code that has already been reviewed, I'll point out the commits that are new and who I'd like to look at them:

            It's also possible there will be an obs_lsst branch added later, if Tim lands Gen3 support there before I can break his branch .

            Show
            jbosch Jim Bosch added a comment - The megaticket is almost passing Jenkins and is ready for final review.  Once again I'm trying to split it up between Tim Jenness and Andy Salnikov , and since there are a lot of packages but also a lot of code that has already been reviewed, I'll point out the commits that are new and who I'd like to look at them: daf_butler : new commits for both; see comment on PR. obs_base : already fully reviewed (by Tim); only changes are addressing the last review. pipe_base , meas_base , pipe_tasks , skymap : Andy, all commits new. obs_subaru : mostly already reviewed by Tim; one new commit for Andy. obs_decam : Tim, all commits new. ctrl_mpexec : Andy, all commits new. ci_hsc_gen2 : already fully reviewed (by Tim); only changes are addressing the last review. ci_hsc_gen3 : Tim, all commits new. It's also possible there will be an obs_lsst branch added later, if Tim lands Gen3 support there before I can break his branch .
            Hide
            salnikov Andy Salnikov added a comment -

            I checked all PRs that are assigned to me, everything looks good. Removing myself from reviewers.

            Show
            salnikov Andy Salnikov added a comment - I checked all PRs that are assigned to me, everything looks good. Removing myself from reviewers.
            Hide
            tjenness Tim Jenness added a comment -

            I had some minor comments on the pull requests. This is an amazing piece of work.

            Show
            tjenness Tim Jenness added a comment - I had some minor comments on the pull requests. This is an amazing piece of work.
            Hide
            jbosch Jim Bosch added a comment - - edited

            Alrighty, three more packages with changes, so one more extra-final set of (trivial) reviews:

             

            Show
            jbosch Jim Bosch added a comment - - edited Alrighty, three more packages with changes, so one more extra-final set of (trivial) reviews: Andy Salnikov : obs_cfht , obs_ctio0m9 (more of the same config overrides for reference catalogs you saw before) Tim Jenness : obs_lsst (config overrides for reference catalogs and API change updates for the new Instrument classes)  
            Hide
            salnikov Andy Salnikov added a comment -

            I approved both PRs.

            Show
            salnikov Andy Salnikov added a comment - I approved both PRs.

              People

              Assignee:
              jbosch Jim Bosch
              Reporter:
              jbosch Jim Bosch
              Reviewers:
              Tim Jenness
              Watchers:
              Andy Salnikov, Jim Bosch, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.