Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-25809

pipetask --skip-existing with partially existing outputs fails with AttributeError: '_QuantumScaffolding' object has no attribute 'taskDef'

    XMLWordPrintable

    Details

    • Story Points:
      0
    • Team:
      Architecture
    • Urgent?:
      No

      Description

      Running pipetask run --skip-existing option with a collection where partial outputs already exist fails like following:

      Failed to build graph: '_QuantumScaffolding' object has no attribute 'taskDef'
      Traceback (most recent call last):
        File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/ctrl_mpexec/20.0.0-4-g936b1ea+1/bin/pipetask", line 26, in <module>
          sys.exit(CmdLineFwk().parseAndRun())
        File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/ctrl_mpexec/20.0.0-4-g936b1ea+1/python/lsst/ctrl/mpexec/cmdLineFwk.py", line 474, in parseAndRun
          qgraph = self.makeGraph(pipeline, args)
        File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/ctrl_mpexec/20.0.0-4-g936b1ea+1/python/lsst/ctrl/mpexec/cmdLineFwk.py", line 620, in makeGraph
          qgraph = graphBuilder.makeGraph(pipeline, collections, run, args.data_query)
        File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/pipe_base/20.0.0-2-g04cfba9/python/lsst/pipe/base/graphBuilder.py", line 789, in makeGraph
          scaffolding.resolveDatasetRefs(self.registry, collections, run, skipExisting=self.skipExisting)
        File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/pipe_base/20.0.0-2-g04cfba9/python/lsst/pipe/base/graphBuilder.py", line 654, in resolveDatasetRefs
          f"Quantum {quantum.dataId} of task with label "
      AttributeError: '_QuantumScaffolding' object has no attribute 'taskDef'
      

      My steps to reproduce this using a built ci_hsc_gen3 repo, with w_2020_27 :

      # first run the ISR part and establish the output collection
      pipetask run -b $CI_HSC_GEN3_DIR/DATA/butler.yaml -i HSC/calib,HSC/raw/all,HSC/masks,ref_cats,skymaps,shared/ci_hsc --output debug02 -t lsst.ip.isr.IsrTask:isr  -d "exposure = 903334 and detector = 22" --instrument lsst.obs.subaru.HyperSuprimeCam
       
      # Check the output collection in the datastore, obtain the run timestamp
      # Replace these two commands with the correct timestamp
      # Touch an empty file so to intentionally make the next pipetask command fail 
      mkdir -p $CI_HSC_GEN3_DIR/DATA/debug02/20200706T14h50m08s/icSrc/r/HSC-R/903334/
      touch $CI_HSC_GEN3_DIR/DATA/debug02/20200706T14h50m08s/icSrc/r/HSC-R/903334/icSrc_r_HSC-R_903334_22_HSC_debug02_20200706T14h50m08s.fits
       
      # Run processCcd the first time.  This should fail due to FileExistsError.
      # I do this to make it fail (presumably) after writing out partial outputs
      pipetask run -b $CI_HSC_GEN3_DIR/DATA/butler.yaml --output debug02 -p $PIPE_TASKS_DIR/pipelines/ProcessCcd.yaml -d "exposure = 903334 and detector = 22" --instrument lsst.obs.subaru.HyperSuprimeCam --extend-run --skip-existing
       
      # Then, remove the empty file  (Replace the timestamp!) 
      rm $CI_HSC_GEN3_DIR/DATA/debug02/20200706T14h50m08s/icSrc/r/HSC-R/903334/icSrc_r_HSC-R_903334_22_HSC_debug02_20200706T14h50m08s.fits
       
      # Run again. This time it will hit the AttributeError 
      pipetask run -b $CI_HSC_GEN3_DIR/DATA/butler.yaml --output debug02 -p $PIPE_TASKS_DIR/pipelines/ProcessCcd.yaml -d "exposure = 903334 and detector = 22" --instrument lsst.obs.subaru.HyperSuprimeCam --extend-run --skip-existing
      

        Attachments

          Activity

          Hide
          hchiang2 Hsin-Fang Chiang added a comment -

          I have a built ci_hsc_gen3 repo at /project/hchiang2/ci_hsc_gen3/w_2020_27/ci_hsc_gen3_copy/ on lsst-dev. Copying it may be faster than running your own copy.

          Show
          hchiang2 Hsin-Fang Chiang added a comment - I have a built ci_hsc_gen3 repo at /project/hchiang2/ci_hsc_gen3/w_2020_27/ci_hsc_gen3_copy/ on lsst-dev. Copying it may be faster than running your own copy.
          Hide
          tjenness Tim Jenness added a comment -

          A quick glance at the code in pipe_base suggests that quantum.task.taskDef might be the right answer (and there is a repeat of this bug in the repr definition of _QuantumScaffolding as well.

          Show
          tjenness Tim Jenness added a comment - A quick glance at the code in pipe_base suggests that quantum.task.taskDef might be the right answer (and there is a repeat of this bug in the repr definition of _QuantumScaffolding as well.
          Hide
          hchiang2 Hsin-Fang Chiang added a comment - - edited

          Replacing quantum.taskDef.label with quantum.task.taskDef.label got me to the next error:

          $ pipetask run -b $CI_HSC_GEN3_DIR/DATA/butler.yaml --output debug02 -p $PIPE_TASKS_DIR/pipelines/ProcessCcd.yaml -d "exposure = 903334 and detector = 22" --instrument lsst.obs.subaru.HyperSuprimeCam --extend-run --skip-existing
          Failed to build graph: Quantum {instrument: HSC, detector: 22, visit: 903334} of task with label 'charImage' has some outputs that exist ([DatasetRef(DatasetType(icExp, {abstract_filter, instrument, detector, physical_filter, visit_system, visit}, ExposureF), {instrument: HSC, detector: 22, visit: 903334}, id=1757, run='debug02/20200706T14h50m08s')]) and others that don't ([DatasetRef(DatasetType(icSrc, {abstract_filter, instrument, detector, physical_filter, visit_system, visit}, SourceCatalog), {instrument: HSC, detector: 22, visit: 903334}), DatasetRef(DatasetType(icExpBackground, {abstract_filter, instrument, detector, physical_filter, visit_system, visit}, Background), {instrument: HSC, detector: 22, visit: 903334}), DatasetRef(DatasetType(charImage_metadata, {abstract_filter, instrument, detector, physical_filter, visit_system, visit}, PropertyList), {instrument: HSC, detector: 22, visit: 903334})]).
          Traceback (most recent call last):
            File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/ctrl_mpexec/20.0.0-4-g936b1ea+1/bin/pipetask", line 26, in <module>
              sys.exit(CmdLineFwk().parseAndRun())
            File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/ctrl_mpexec/20.0.0-4-g936b1ea+1/python/lsst/ctrl/mpexec/cmdLineFwk.py", line 474, in parseAndRun
              qgraph = self.makeGraph(pipeline, args)
            File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/ctrl_mpexec/20.0.0-4-g936b1ea+1/python/lsst/ctrl/mpexec/cmdLineFwk.py", line 620, in makeGraph
              qgraph = graphBuilder.makeGraph(pipeline, collections, run, args.data_query)
            File "/home/hchiang2/stack/pipe_base/python/lsst/pipe/base/graphBuilder.py", line 789, in makeGraph
              scaffolding.resolveDatasetRefs(self.registry, collections, run, skipExisting=self.skipExisting)
            File "/home/hchiang2/stack/pipe_base/python/lsst/pipe/base/graphBuilder.py", line 654, in resolveDatasetRefs
              f"Quantum {quantum.dataId} of task with label "
          lsst.pipe.base.graphBuilder.OutputExistsError: Quantum {instrument: HSC, detector: 22, visit: 903334} of task with label 'charImage' has some outputs that exist ([DatasetRef(DatasetType(icExp, {abstract_filter, instrument, detector, physical_filter, visit_system, visit}, ExposureF), {instrument: HSC, detector: 22, visit: 903334}, id=1757, run='debug02/20200706T14h50m08s')]) and others that don't ([DatasetRef(DatasetType(icSrc, {abstract_filter, instrument, detector, physical_filter, visit_system, visit}, SourceCatalog), {instrument: HSC, detector: 22, visit: 903334}), DatasetRef(DatasetType(icExpBackground, {abstract_filter, instrument, detector, physical_filter, visit_system, visit}, Background), {instrument: HSC, detector: 22, visit: 903334}), DatasetRef(DatasetType(charImage_metadata, {abstract_filter, instrument, detector, physical_filter, visit_system, visit}, PropertyList), {instrument: HSC, detector: 22, visit: 903334})]).
          

          which seems to fail at the intended place. Indeed some of the outputs exist and some other outputs do not. But, shouldn't this case be allowed, and missing outputs means that quantum needs to be included?

          Show
          hchiang2 Hsin-Fang Chiang added a comment - - edited Replacing quantum.taskDef.label with quantum.task.taskDef.label got me to the next error: $ pipetask run -b $CI_HSC_GEN3_DIR/DATA/butler.yaml --output debug02 -p $PIPE_TASKS_DIR/pipelines/ProcessCcd.yaml -d "exposure = 903334 and detector = 22" --instrument lsst.obs.subaru.HyperSuprimeCam --extend-run --skip-existing Failed to build graph: Quantum {instrument: HSC, detector: 22 , visit: 903334 } of task with label 'charImage' has some outputs that exist ([DatasetRef(DatasetType(icExp, {abstract_filter, instrument, detector, physical_filter, visit_system, visit}, ExposureF), {instrument: HSC, detector: 22 , visit: 903334 }, id= 1757 , run= 'debug02/20200706T14h50m08s' )]) and others that don't ([DatasetRef(DatasetType(icSrc, {abstract_filter, instrument, detector, physical_filter, visit_system, visit}, SourceCatalog), {instrument: HSC, detector: 22 , visit: 903334 }), DatasetRef(DatasetType(icExpBackground, {abstract_filter, instrument, detector, physical_filter, visit_system, visit}, Background), {instrument: HSC, detector: 22 , visit: 903334 }), DatasetRef(DatasetType(charImage_metadata, {abstract_filter, instrument, detector, physical_filter, visit_system, visit}, PropertyList), {instrument: HSC, detector: 22 , visit: 903334 })]). Traceback (most recent call last): File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/ctrl_mpexec/20.0.0-4-g936b1ea+1/bin/pipetask" , line 26 , in <module> sys.exit(CmdLineFwk().parseAndRun()) File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/ctrl_mpexec/20.0.0-4-g936b1ea+1/python/lsst/ctrl/mpexec/cmdLineFwk.py" , line 474 , in parseAndRun qgraph = self.makeGraph(pipeline, args) File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/ctrl_mpexec/20.0.0-4-g936b1ea+1/python/lsst/ctrl/mpexec/cmdLineFwk.py" , line 620 , in makeGraph qgraph = graphBuilder.makeGraph(pipeline, collections, run, args.data_query) File "/home/hchiang2/stack/pipe_base/python/lsst/pipe/base/graphBuilder.py" , line 789 , in makeGraph scaffolding.resolveDatasetRefs(self.registry, collections, run, skipExisting=self.skipExisting) File "/home/hchiang2/stack/pipe_base/python/lsst/pipe/base/graphBuilder.py" , line 654 , in resolveDatasetRefs f "Quantum {quantum.dataId} of task with label " lsst.pipe.base.graphBuilder.OutputExistsError: Quantum {instrument: HSC, detector: 22 , visit: 903334 } of task with label 'charImage' has some outputs that exist ([DatasetRef(DatasetType(icExp, {abstract_filter, instrument, detector, physical_filter, visit_system, visit}, ExposureF), {instrument: HSC, detector: 22 , visit: 903334 }, id= 1757 , run= 'debug02/20200706T14h50m08s' )]) and others that don't ([DatasetRef(DatasetType(icSrc, {abstract_filter, instrument, detector, physical_filter, visit_system, visit}, SourceCatalog), {instrument: HSC, detector: 22 , visit: 903334 }), DatasetRef(DatasetType(icExpBackground, {abstract_filter, instrument, detector, physical_filter, visit_system, visit}, Background), {instrument: HSC, detector: 22 , visit: 903334 }), DatasetRef(DatasetType(charImage_metadata, {abstract_filter, instrument, detector, physical_filter, visit_system, visit}, PropertyList), {instrument: HSC, detector: 22 , visit: 903334 })]). which seems to fail at the intended place. Indeed some of the outputs exist and some other outputs do not. But, shouldn't this case be allowed, and missing outputs means that quantum needs to be included?
          Hide
          jbosch Jim Bosch added a comment -

          Indeed some of the outputs exist and some other outputs do not. But, shouldn't this case be allowed, and missing outputs means that quantum needs to be included?

          This is indeed the current expected behavior.  Including the quantum when some outputs are present is not unreasonable, but it means we have to solve the harder problem of what to do with the existing outputs (because new ones will clash with them in a database-constraint-violation sense):

          • We could delete them, and write outputs to the current run.  This is okay if there are no other datasets that reference them in provenance, which is probably the case, but guarding against those kinds of edge cases is still a pain.
          • We could move them to a different run.  At present this is not a low-level operation Registry permits, and while I can't think of any reason it's problematic, I'd want to think further about it.

          It would also be reasonable, I think, to prohibit this operation with --extend-run, but allow --skip-existing to look for existing outputs in some other collection (this may already be doable with some other set of options, but I don't think so). That might make it so --extend-run is not as usable for retries as it might otherwise be, but I think it's also a reasonable question it should be used for retries.

          Show
          jbosch Jim Bosch added a comment - Indeed some of the outputs exist and some other outputs do not. But, shouldn't this case be allowed, and missing outputs means that quantum needs to be included? This is indeed the current expected behavior.  Including the quantum when some outputs are present is not unreasonable, but it means we have to solve the harder problem of what to do with the existing outputs (because new ones will clash with them in a database-constraint-violation sense): We could delete them, and write outputs to the current run.  This is okay if there are no other datasets that reference them in provenance, which is probably the case, but guarding against those kinds of edge cases is still a pain. We could move them to a different run.  At present this is not a low-level operation Registry permits, and while I can't think of any reason it's problematic, I'd want to think further about it. It would also be reasonable, I think, to prohibit this operation with --extend-run , but allow --skip-existing to look for existing outputs in some other collection (this may already be doable with some other set of options, but I don't think so). That might make it so --extend-run is not as usable for retries as it might otherwise be, but I think it's also a reasonable question it should be used for retries.
          Hide
          hchiang2 Hsin-Fang Chiang added a comment -

          Somehow most of the failures I got recently happened halfway when jobs were writing out output files. So --skip-existing wouldn't be as useful if it can't handle partial outputs existing for a quantum.

          Both options seem fine to me. If we write the new outputs into a new run of the same collection, I assume both old and new runs of the collection will be butler-gettable afterwards?

          https://github.com/lsst/ctrl_mpexec/blob/8b60ee766868d154b29e9487d9d472f1bb34d2c1/python/lsst/ctrl/mpexec/cmdLineParser.py#L366 and doc around there was why I used -skip-existing together with extend-run. It seems convenient to think that -skip-existing could look for existing data in a specified collection or run, and then think of the run part separately? Originally I meant to store a quantum graph skipping the existing data, and run the quantum graph with a different run.

          Show
          hchiang2 Hsin-Fang Chiang added a comment - Somehow most of the failures I got recently happened halfway when jobs were writing out output files. So --skip-existing wouldn't be as useful if it can't handle partial outputs existing for a quantum. Both options seem fine to me. If we write the new outputs into a new run of the same collection, I assume both old and new runs of the collection will be butler-gettable afterwards? https://github.com/lsst/ctrl_mpexec/blob/8b60ee766868d154b29e9487d9d472f1bb34d2c1/python/lsst/ctrl/mpexec/cmdLineParser.py#L366 and doc around there was why I used - skip-existing together with extend-run . It seems convenient to think that -skip-existing could look for existing data in a specified collection or run, and then think of the run part separately? Originally I meant to store a quantum graph skipping the existing data, and run the quantum graph with a different run.
          Hide
          tjenness Tim Jenness added a comment -

          Is this now moot with the new clobber-partial-outputs option from DM-26131 ?

          I take it we still need to merge the fixes to the error reporting that Hsin-Fang Chiang has already made?

          Show
          tjenness Tim Jenness added a comment - Is this now moot with the new clobber-partial-outputs option from DM-26131 ? I take it we still need to merge the fixes to the error reporting that Hsin-Fang Chiang has already made?
          Hide
          hchiang2 Hsin-Fang Chiang added a comment -

          Yes I think with DM-25818 and DM-26131 this is no longer a problem.

          And the first error has been fixed in https://github.com/lsst/pipe_base/commit/88376a3730dc38df3fe46cfb70f2ee8abeced750

          I'm closing this as "Done" then.

          Show
          hchiang2 Hsin-Fang Chiang added a comment - Yes I think with DM-25818 and DM-26131 this is no longer a problem. And the first error has been fixed in https://github.com/lsst/pipe_base/commit/88376a3730dc38df3fe46cfb70f2ee8abeced750 I'm closing this as "Done" then.

            People

            Assignee:
            hchiang2 Hsin-Fang Chiang
            Reporter:
            hchiang2 Hsin-Fang Chiang
            Watchers:
            Hsin-Fang Chiang, Jim Bosch, Kian-Tat Lim, Tim Jenness
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                CI Builds

                No builds found.