Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-16539

Add level of indirection betwen calib identifier and exposure range in schema

    XMLWordPrintable

    Details

    • Story Points:
      4
    • Sprint:
      BG3_F18_11, BG3_S19_01
    • Team:
      Data Release Production

      Description

      The use case for creating calibration products requires making creation of the Dataset distinct from (and prior to) the identification of its validity range.  That suggests we should replace the ExposureRange DataUnit with something like a "CalibIdentifier" (currently imagines as a string provided explicitly when running calib-creation PipelineTasks).  CalibIdentifier would also serve as the primary key of a new table with validity range fields, which would be used to define a view for a traditional DataUnitJoin between CalibIdentifier and Exposure.  That should remove the need for some special-casing of ExposureRange in preflight added on DM-16482, and make it easier to refactor DataUnit code later.

        Attachments

          Issue Links

            Activity

            Hide
            salnikov Andy Salnikov added a comment -

            Jim Bosch, no worries, I'll make time for review and anything critical that depends on me. Will merge DM-17720 soon so you can switch to master.

            Show
            salnikov Andy Salnikov added a comment - Jim Bosch , no worries, I'll make time for review and anything critical that depends on me. Will merge  DM-17720 soon so you can switch to master.
            Hide
            jbosch Jim Bosch added a comment -

            So, I thought I was just about done with this, but it turns out my fix (which involves an extra join inside each Dataset subquery) brings the full suite of DRP PipelineTasks back over the SQLite 64-join limit.

            I think I'm going to fix that by introducing a new field in the DatasetTypeDescriptors returned by PipelineTask for "expected" input datasets: these would be considered optional in the query (so they don't constrain the graph), but would cause a hard error if they're missing when we try to build the QuantumGraph.  That would let us defer the lookup the way we can for outputs, and happily I think it's actually more desirable behavior for all of the calibration datasets that are causing these problems.

            Show
            jbosch Jim Bosch added a comment - So, I thought I was just about done with this, but it turns out my fix (which involves an extra join inside each Dataset subquery) brings the full suite of DRP PipelineTasks back over the SQLite 64-join limit. I think I'm going to fix that by introducing a new field in the DatasetTypeDescriptors returned by PipelineTask for "expected" input datasets: these would be considered optional in the query (so they don't constrain the graph), but would cause a hard error if they're missing when we try to build the QuantumGraph.  That would let us defer the lookup the way we can for outputs, and happily I think it's actually more desirable behavior for all of the calibration datasets that are causing these problems.
            Hide
            jbosch Jim Bosch added a comment -

            Andy Salnikov, I think this is finally ready for review.  There are major changes to daf_butler and much less in the other packages (though the pipe_base ones aren't trivial, either).  I'm sorry the daf_butler commits aren't in better shape, but I don't see any easy way to restructure them in a way that would improve that with a lot of effort.  Given that, I wouldn't recommend reviewing commit-by-commit, though you should probably look at the commit messages before diving in.

            The full set of PRs is:

             

            Show
            jbosch Jim Bosch added a comment - Andy Salnikov , I think this is finally ready for review.  There are major changes to daf_butler and much less in the other packages (though the pipe_base ones aren't trivial, either).  I'm sorry the daf_butler commits aren't in better shape, but I don't see any easy way to restructure them in a way that would improve that with a lot of effort.  Given that, I wouldn't recommend reviewing commit-by-commit, though you should probably look at the commit messages before diving in. The full set of PRs is: daf_butler: https://github.com/lsst/daf_butler/pull/135 pipe_base: https://github.com/lsst/pipe_base/pull/90 obs_subaru: https://github.com/lsst/obs_subaru/pull/182 ctrl_mpexec: https://github.com/lsst/ctrl_mpexec/pull/14 pipe_tasks: https://github.com/lsst/pipe_tasks/pull/279 ip_isr: https://github.com/lsst/ip_isr/pull/79 ci_hsc: https://github.com/lsst/ci_hsc/pull/74  
            Hide
            salnikov Andy Salnikov added a comment -

            Looks OK, there are few minor comments on daf_butler and pipe_base. 

            Show
            salnikov Andy Salnikov added a comment - Looks OK, there are few minor comments on daf_butler and pipe_base. 
            Hide
            jbosch Jim Bosch added a comment -

            All branches merged to master.

            Show
            jbosch Jim Bosch added a comment - All branches merged to master.

              People

              Assignee:
              jbosch Jim Bosch
              Reporter:
              jbosch Jim Bosch
              Reviewers:
              Andy Salnikov
              Watchers:
              Andy Salnikov, Jim Bosch
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.