Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-23701

pipetask-produced DOT for pipelines should show prerequisite inputs

    Details

    • Story Points:
      1
    • Sprint:
      DB_S20_02
    • Team:
      Data Access and Database

      Description

      the DOT graph produced now for pipelines does not include prerequisite inputs, would be nice to add them to the graph but also make them distinguishable from regular inputs.

        Attachments

          Activity

          Hide
          salnikov Andy Salnikov added a comment -

          Trying to just add prerequisiteInputs in the same way as regular inputs gives me a crash:

          Failed to build pipeline: 'skypix'
          Traceback (most recent call last):
            File "/software/lsstsw/stack_20200220/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/site-packages/ipdb/__main__.py", line 169, in main
              pdb._runscript(mainpyfile)
            File "/software/lsstsw/stack_20200220/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/pdb.py", line 1568, in _runscript
              self.run(statement)
            File "/software/lsstsw/stack_20200220/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/bdb.py", line 585, in run
              exec(cmd, globals, locals)
            File "<string>", line 1, in <module>
            File "/project/salnikov/gen3-middleware/ctrl_mpexec/bin/pipetask", line 24, in <module>
              import sys
            File "/project/salnikov/gen3-middleware/ctrl_mpexec/python/lsst/ctrl/mpexec/cmdLineFwk.py", line 144, in parseAndRun
              pipeline = self.makePipeline(args)
            File "/project/salnikov/gen3-middleware/ctrl_mpexec/python/lsst/ctrl/mpexec/cmdLineFwk.py", line 266, in makePipeline
              pipeline2dot(pipeline, args.pipeline_dot)
            File "/project/salnikov/gen3-middleware/ctrl_mpexec/python/lsst/ctrl/mpexec/dotTools.py", line 216, in pipeline2dot
              dsType = attr.makeDatasetType(universe)
            File "/project/salnikov/gen3-middleware/pipe_base/python/lsst/pipe/base/connectionTypes.py", line 123, in makeDatasetType
              universe.extract(self.dimensions),
            File "/project/salnikov/gen3-middleware/daf_butler/python/lsst/daf/butler/core/dimensions/universe.py", line 160, in extract
              return DimensionGraph(universe=self, names=names)
            File "/project/salnikov/gen3-middleware/daf_butler/python/lsst/daf/butler/core/dimensions/graph.py", line 126, in __new__
              names.update(universe[name]._recursiveDependencyNames)
            File "/project/salnikov/gen3-middleware/daf_butler/python/lsst/daf/butler/core/dimensions/graph.py", line 252, in __getitem__
              return self.elements[name]
            File "/project/salnikov/gen3-middleware/daf_butler/python/lsst/daf/butler/core/utils.py", line 581, in __getitem__
              return self._dict[name]
          KeyError: 'skypix'
          

          I think 'skypix' is somewhat special and may need separate handling in connection types.

          Here is a connection that crashes:

          ipdb> p taskDef.connections                                                                                                                                                             
          <lsst.pipe.tasks.calibrate.CalibrateConnections object at 0x7fc6d526fc50>
          ipdb> p taskDef.connections.photoRefCat                                                                                                                                                 
          PrerequisiteInput(name='cal_ref_cat', storageClass='SimpleCatalog', doc='Reference catalog to use for photometric calibration', multiple=True, dimensions=('skypix',), deferLoad=True, lookupFunction=None)
          

          which is defined as

              photoRefCat = cT.PrerequisiteInput(
                  doc="Reference catalog to use for photometric calibration",
                  name="cal_ref_cat",
                  storageClass="SimpleCatalog",
                  dimensions=("skypix",),
                  deferLoad=True,
                  multiple=True
              )
          

          I'm not sure how it is supposed to work though. Jim Bosch, can you share your wisdom - how to make makeDatasetType() not to crash for this connection, do we need to catch "skypix" and replace it with some logic?

          Show
          salnikov Andy Salnikov added a comment - Trying to just add prerequisiteInputs in the same way as regular inputs gives me a crash: Failed to build pipeline: 'skypix' Traceback (most recent call last): File "/software/lsstsw/stack_20200220/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/site-packages/ipdb/__main__.py", line 169, in main pdb._runscript(mainpyfile) File "/software/lsstsw/stack_20200220/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/pdb.py", line 1568, in _runscript self.run(statement) File "/software/lsstsw/stack_20200220/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/bdb.py", line 585, in run exec(cmd, globals, locals) File "<string>", line 1, in <module> File "/project/salnikov/gen3-middleware/ctrl_mpexec/bin/pipetask", line 24, in <module> import sys File "/project/salnikov/gen3-middleware/ctrl_mpexec/python/lsst/ctrl/mpexec/cmdLineFwk.py", line 144, in parseAndRun pipeline = self.makePipeline(args) File "/project/salnikov/gen3-middleware/ctrl_mpexec/python/lsst/ctrl/mpexec/cmdLineFwk.py", line 266, in makePipeline pipeline2dot(pipeline, args.pipeline_dot) File "/project/salnikov/gen3-middleware/ctrl_mpexec/python/lsst/ctrl/mpexec/dotTools.py", line 216, in pipeline2dot dsType = attr.makeDatasetType(universe) File "/project/salnikov/gen3-middleware/pipe_base/python/lsst/pipe/base/connectionTypes.py", line 123, in makeDatasetType universe.extract(self.dimensions), File "/project/salnikov/gen3-middleware/daf_butler/python/lsst/daf/butler/core/dimensions/universe.py", line 160, in extract return DimensionGraph(universe=self, names=names) File "/project/salnikov/gen3-middleware/daf_butler/python/lsst/daf/butler/core/dimensions/graph.py", line 126, in __new__ names.update(universe[name]._recursiveDependencyNames) File "/project/salnikov/gen3-middleware/daf_butler/python/lsst/daf/butler/core/dimensions/graph.py", line 252, in __getitem__ return self.elements[name] File "/project/salnikov/gen3-middleware/daf_butler/python/lsst/daf/butler/core/utils.py", line 581, in __getitem__ return self._dict[name] KeyError: 'skypix' I think 'skypix' is somewhat special and may need separate handling in connection types. Here is a connection that crashes: ipdb> p taskDef.connections <lsst.pipe.tasks.calibrate.CalibrateConnections object at 0x7fc6d526fc50> ipdb> p taskDef.connections.photoRefCat PrerequisiteInput(name='cal_ref_cat', storageClass='SimpleCatalog', doc='Reference catalog to use for photometric calibration', multiple=True, dimensions=('skypix',), deferLoad=True, lookupFunction=None) which is defined as photoRefCat = cT.PrerequisiteInput( doc="Reference catalog to use for photometric calibration", name="cal_ref_cat", storageClass="SimpleCatalog", dimensions=("skypix",), deferLoad=True, multiple=True ) I'm not sure how it is supposed to work though. Jim Bosch , can you share your wisdom - how to make makeDatasetType() not to crash for this connection, do we need to catch "skypix" and replace it with some logic?
          Hide
          jbosch Jim Bosch added a comment -

          do we need to catch "skypix" and replace it with some logic?

          Yes, exactly, "skypix" is now a placeholder that is replaced by the name of an actual SkyPixDimension, and it requires the dataset type to already be in the registry.  The special-case logic used when building QuantumGraphs is at https://github.com/lsst/pipe_base/blob/master/python/lsst/pipe/base/pipeline.py#L459.

          Show
          jbosch Jim Bosch added a comment - do we need to catch "skypix" and replace it with some logic? Yes, exactly, "skypix" is now a placeholder that is replaced by the name of an actual SkyPixDimension, and it requires the dataset type to already be in the registry.  The special-case logic used when building QuantumGraphs is at https://github.com/lsst/pipe_base/blob/master/python/lsst/pipe/base/pipeline.py#L459 .
          Hide
          salnikov Andy Salnikov added a comment -

          Jim Bosch, could you look at it when you have a minute? I had to do some creative handling of "skypix" dimension without going to registry for dataset type definition. The output DOT file still shows "skypix", I think this is OK for the purpose of displaying pipeline connections. An example graph is attached to this ticket, prerequisite inputs are shown with dashed arrows.

          Show
          salnikov Andy Salnikov added a comment - Jim Bosch , could you look at it when you have a minute? I had to do some creative handling of "skypix" dimension without going to registry for dataset type definition. The output DOT file still shows "skypix", I think this is OK for the purpose of displaying pipeline connections. An example graph is attached to this ticket, prerequisite inputs are shown with dashed arrows.
          Hide
          salnikov Andy Salnikov added a comment -
          Show
          salnikov Andy Salnikov added a comment - PR link: https://github.com/lsst/ctrl_mpexec/pull/45
          Hide
          jbosch Jim Bosch added a comment -

          Looks good!  Sorry this fell off my radar for a few days.

          Show
          jbosch Jim Bosch added a comment - Looks good!  Sorry this fell off my radar for a few days.

            People

            • Assignee:
              salnikov Andy Salnikov
              Reporter:
              salnikov Andy Salnikov
              Reviewers:
              Jim Bosch
              Watchers:
              Andy Salnikov, Jim Bosch
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Summary Panel