# pipetask --skip-existing with partially existing outputs fails with AttributeError: '_QuantumScaffolding' object has no attribute 'taskDef'

XMLWordPrintable

#### Details

• Type: Bug
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
• Story Points:
0
• Team:
Architecture
• Urgent?:
No

#### Description

Running pipetask run --skip-existing option with a collection where partial outputs already exist fails like following:

 Failed to build graph: '_QuantumScaffolding' object has no attribute 'taskDef' Traceback (most recent call last):  File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/ctrl_mpexec/20.0.0-4-g936b1ea+1/bin/pipetask", line 26, in   sys.exit(CmdLineFwk().parseAndRun())  File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/ctrl_mpexec/20.0.0-4-g936b1ea+1/python/lsst/ctrl/mpexec/cmdLineFwk.py", line 474, in parseAndRun  qgraph = self.makeGraph(pipeline, args)  File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/ctrl_mpexec/20.0.0-4-g936b1ea+1/python/lsst/ctrl/mpexec/cmdLineFwk.py", line 620, in makeGraph  qgraph = graphBuilder.makeGraph(pipeline, collections, run, args.data_query)  File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/pipe_base/20.0.0-2-g04cfba9/python/lsst/pipe/base/graphBuilder.py", line 789, in makeGraph  scaffolding.resolveDatasetRefs(self.registry, collections, run, skipExisting=self.skipExisting)  File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/pipe_base/20.0.0-2-g04cfba9/python/lsst/pipe/base/graphBuilder.py", line 654, in resolveDatasetRefs  f"Quantum {quantum.dataId} of task with label " AttributeError: '_QuantumScaffolding' object has no attribute 'taskDef' 

My steps to reproduce this using a built ci_hsc_gen3 repo, with w_2020_27 :

 # first run the ISR part and establish the output collection pipetask run -b $CI_HSC_GEN3_DIR/DATA/butler.yaml -i HSC/calib,HSC/raw/all,HSC/masks,ref_cats,skymaps,shared/ci_hsc --output debug02 -t lsst.ip.isr.IsrTask:isr -d "exposure = 903334 and detector = 22" --instrument lsst.obs.subaru.HyperSuprimeCam   # Check the output collection in the datastore, obtain the run timestamp # Replace these two commands with the correct timestamp # Touch an empty file so to intentionally make the next pipetask command fail  mkdir -p$CI_HSC_GEN3_DIR/DATA/debug02/20200706T14h50m08s/icSrc/r/HSC-R/903334/ touch $CI_HSC_GEN3_DIR/DATA/debug02/20200706T14h50m08s/icSrc/r/HSC-R/903334/icSrc_r_HSC-R_903334_22_HSC_debug02_20200706T14h50m08s.fits   # Run processCcd the first time. This should fail due to FileExistsError. # I do this to make it fail (presumably) after writing out partial outputs pipetask run -b$CI_HSC_GEN3_DIR/DATA/butler.yaml --output debug02 -p $PIPE_TASKS_DIR/pipelines/ProcessCcd.yaml -d "exposure = 903334 and detector = 22" --instrument lsst.obs.subaru.HyperSuprimeCam --extend-run --skip-existing   # Then, remove the empty file (Replace the timestamp!)  rm$CI_HSC_GEN3_DIR/DATA/debug02/20200706T14h50m08s/icSrc/r/HSC-R/903334/icSrc_r_HSC-R_903334_22_HSC_debug02_20200706T14h50m08s.fits   # Run again. This time it will hit the AttributeError  pipetask run -b $CI_HSC_GEN3_DIR/DATA/butler.yaml --output debug02 -p$PIPE_TASKS_DIR/pipelines/ProcessCcd.yaml -d "exposure = 903334 and detector = 22" --instrument lsst.obs.subaru.HyperSuprimeCam --extend-run --skip-existing 

#### Activity

Hide
Hsin-Fang Chiang added a comment -

I have a built ci_hsc_gen3 repo at /project/hchiang2/ci_hsc_gen3/w_2020_27/ci_hsc_gen3_copy/ on lsst-dev. Copying it may be faster than running your own copy.

Show
Hsin-Fang Chiang added a comment - I have a built ci_hsc_gen3 repo at /project/hchiang2/ci_hsc_gen3/w_2020_27/ci_hsc_gen3_copy/ on lsst-dev. Copying it may be faster than running your own copy.
Hide
Tim Jenness added a comment -

A quick glance at the code in pipe_base suggests that quantum.task.taskDef might be the right answer (and there is a repeat of this bug in the repr definition of _QuantumScaffolding as well.

Show
Tim Jenness added a comment - A quick glance at the code in pipe_base suggests that quantum.task.taskDef might be the right answer (and there is a repeat of this bug in the repr definition of _QuantumScaffolding as well.
Hide
Hsin-Fang Chiang added a comment - - edited

Replacing quantum.taskDef.label with quantum.task.taskDef.label got me to the next error:

 $pipetask run -b$CI_HSC_GEN3_DIR/DATA/butler.yaml --output debug02 -p $PIPE_TASKS_DIR/pipelines/ProcessCcd.yaml -d "exposure = 903334 and detector = 22" --instrument lsst.obs.subaru.HyperSuprimeCam --extend-run --skip-existing Failed to build graph: Quantum {instrument: HSC, detector: 22, visit: 903334} of task with label 'charImage' has some outputs that exist ([DatasetRef(DatasetType(icExp, {abstract_filter, instrument, detector, physical_filter, visit_system, visit}, ExposureF), {instrument: HSC, detector: 22, visit: 903334}, id=1757, run='debug02/20200706T14h50m08s')]) and others that don't ([DatasetRef(DatasetType(icSrc, {abstract_filter, instrument, detector, physical_filter, visit_system, visit}, SourceCatalog), {instrument: HSC, detector: 22, visit: 903334}), DatasetRef(DatasetType(icExpBackground, {abstract_filter, instrument, detector, physical_filter, visit_system, visit}, Background), {instrument: HSC, detector: 22, visit: 903334}), DatasetRef(DatasetType(charImage_metadata, {abstract_filter, instrument, detector, physical_filter, visit_system, visit}, PropertyList), {instrument: HSC, detector: 22, visit: 903334})]). Traceback (most recent call last):  File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/ctrl_mpexec/20.0.0-4-g936b1ea+1/bin/pipetask", line 26, in   sys.exit(CmdLineFwk().parseAndRun())  File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/ctrl_mpexec/20.0.0-4-g936b1ea+1/python/lsst/ctrl/mpexec/cmdLineFwk.py", line 474, in parseAndRun  qgraph = self.makeGraph(pipeline, args)  File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/ctrl_mpexec/20.0.0-4-g936b1ea+1/python/lsst/ctrl/mpexec/cmdLineFwk.py", line 620, in makeGraph  qgraph = graphBuilder.makeGraph(pipeline, collections, run, args.data_query)  File "/home/hchiang2/stack/pipe_base/python/lsst/pipe/base/graphBuilder.py", line 789, in makeGraph  scaffolding.resolveDatasetRefs(self.registry, collections, run, skipExisting=self.skipExisting)  File "/home/hchiang2/stack/pipe_base/python/lsst/pipe/base/graphBuilder.py", line 654, in resolveDatasetRefs  f"Quantum {quantum.dataId} of task with label " lsst.pipe.base.graphBuilder.OutputExistsError: Quantum {instrument: HSC, detector: 22, visit: 903334} of task with label 'charImage' has some outputs that exist ([DatasetRef(DatasetType(icExp, {abstract_filter, instrument, detector, physical_filter, visit_system, visit}, ExposureF), {instrument: HSC, detector: 22, visit: 903334}, id=1757, run='debug02/20200706T14h50m08s')]) and others that don't ([DatasetRef(DatasetType(icSrc, {abstract_filter, instrument, detector, physical_filter, visit_system, visit}, SourceCatalog), {instrument: HSC, detector: 22, visit: 903334}), DatasetRef(DatasetType(icExpBackground, {abstract_filter, instrument, detector, physical_filter, visit_system, visit}, Background), {instrument: HSC, detector: 22, visit: 903334}), DatasetRef(DatasetType(charImage_metadata, {abstract_filter, instrument, detector, physical_filter, visit_system, visit}, PropertyList), {instrument: HSC, detector: 22, visit: 903334})]).  which seems to fail at the intended place. Indeed some of the outputs exist and some other outputs do not. But, shouldn't this case be allowed, and missing outputs means that quantum needs to be included? Show Hsin-Fang Chiang added a comment - - edited Replacing quantum.taskDef.label with quantum.task.taskDef.label got me to the next error:$ pipetask run -b $CI_HSC_GEN3_DIR/DATA/butler.yaml --output debug02 -p$PIPE_TASKS_DIR/pipelines/ProcessCcd.yaml -d "exposure = 903334 and detector = 22" --instrument lsst.obs.subaru.HyperSuprimeCam --extend-run --skip-existing Failed to build graph: Quantum {instrument: HSC, detector: 22 , visit: 903334 } of task with label 'charImage' has some outputs that exist ([DatasetRef(DatasetType(icExp, {abstract_filter, instrument, detector, physical_filter, visit_system, visit}, ExposureF), {instrument: HSC, detector: 22 , visit: 903334 }, id= 1757 , run= 'debug02/20200706T14h50m08s' )]) and others that don't ([DatasetRef(DatasetType(icSrc, {abstract_filter, instrument, detector, physical_filter, visit_system, visit}, SourceCatalog), {instrument: HSC, detector: 22 , visit: 903334 }), DatasetRef(DatasetType(icExpBackground, {abstract_filter, instrument, detector, physical_filter, visit_system, visit}, Background), {instrument: HSC, detector: 22 , visit: 903334 }), DatasetRef(DatasetType(charImage_metadata, {abstract_filter, instrument, detector, physical_filter, visit_system, visit}, PropertyList), {instrument: HSC, detector: 22 , visit: 903334 })]). Traceback (most recent call last): File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/ctrl_mpexec/20.0.0-4-g936b1ea+1/bin/pipetask" , line 26 , in <module> sys.exit(CmdLineFwk().parseAndRun()) File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/ctrl_mpexec/20.0.0-4-g936b1ea+1/python/lsst/ctrl/mpexec/cmdLineFwk.py" , line 474 , in parseAndRun qgraph = self.makeGraph(pipeline, args) File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/ctrl_mpexec/20.0.0-4-g936b1ea+1/python/lsst/ctrl/mpexec/cmdLineFwk.py" , line 620 , in makeGraph qgraph = graphBuilder.makeGraph(pipeline, collections, run, args.data_query) File "/home/hchiang2/stack/pipe_base/python/lsst/pipe/base/graphBuilder.py" , line 789 , in makeGraph scaffolding.resolveDatasetRefs(self.registry, collections, run, skipExisting=self.skipExisting) File "/home/hchiang2/stack/pipe_base/python/lsst/pipe/base/graphBuilder.py" , line 654 , in resolveDatasetRefs f "Quantum {quantum.dataId} of task with label " lsst.pipe.base.graphBuilder.OutputExistsError: Quantum {instrument: HSC, detector: 22 , visit: 903334 } of task with label 'charImage' has some outputs that exist ([DatasetRef(DatasetType(icExp, {abstract_filter, instrument, detector, physical_filter, visit_system, visit}, ExposureF), {instrument: HSC, detector: 22 , visit: 903334 }, id= 1757 , run= 'debug02/20200706T14h50m08s' )]) and others that don't ([DatasetRef(DatasetType(icSrc, {abstract_filter, instrument, detector, physical_filter, visit_system, visit}, SourceCatalog), {instrument: HSC, detector: 22 , visit: 903334 }), DatasetRef(DatasetType(icExpBackground, {abstract_filter, instrument, detector, physical_filter, visit_system, visit}, Background), {instrument: HSC, detector: 22 , visit: 903334 }), DatasetRef(DatasetType(charImage_metadata, {abstract_filter, instrument, detector, physical_filter, visit_system, visit}, PropertyList), {instrument: HSC, detector: 22 , visit: 903334 })]). which seems to fail at the intended place. Indeed some of the outputs exist and some other outputs do not. But, shouldn't this case be allowed, and missing outputs means that quantum needs to be included?
Hide
Jim Bosch added a comment -

Indeed some of the outputs exist and some other outputs do not. But, shouldn't this case be allowed, and missing outputs means that quantum needs to be included?

This is indeed the current expected behavior.  Including the quantum when some outputs are present is not unreasonable, but it means we have to solve the harder problem of what to do with the existing outputs (because new ones will clash with them in a database-constraint-violation sense):

• We could delete them, and write outputs to the current run.  This is okay if there are no other datasets that reference them in provenance, which is probably the case, but guarding against those kinds of edge cases is still a pain.
• We could move them to a different run.  At present this is not a low-level operation Registry permits, and while I can't think of any reason it's problematic, I'd want to think further about it.

It would also be reasonable, I think, to prohibit this operation with --extend-run, but allow --skip-existing to look for existing outputs in some other collection (this may already be doable with some other set of options, but I don't think so). That might make it so --extend-run is not as usable for retries as it might otherwise be, but I think it's also a reasonable question it should be used for retries.

Show
Jim Bosch added a comment - Indeed some of the outputs exist and some other outputs do not. But, shouldn't this case be allowed, and missing outputs means that quantum needs to be included? This is indeed the current expected behavior.  Including the quantum when some outputs are present is not unreasonable, but it means we have to solve the harder problem of what to do with the existing outputs (because new ones will clash with them in a database-constraint-violation sense): We could delete them, and write outputs to the current run.  This is okay if there are no other datasets that reference them in provenance, which is probably the case, but guarding against those kinds of edge cases is still a pain. We could move them to a different run.  At present this is not a low-level operation Registry permits, and while I can't think of any reason it's problematic, I'd want to think further about it. It would also be reasonable, I think, to prohibit this operation with --extend-run , but allow --skip-existing to look for existing outputs in some other collection (this may already be doable with some other set of options, but I don't think so). That might make it so --extend-run is not as usable for retries as it might otherwise be, but I think it's also a reasonable question it should be used for retries.
Hide
Hsin-Fang Chiang added a comment -

Somehow most of the failures I got recently happened halfway when jobs were writing out output files. So --skip-existing wouldn't be as useful if it can't handle partial outputs existing for a quantum.

Both options seem fine to me. If we write the new outputs into a new run of the same collection, I assume both old and new runs of the collection will be butler-gettable afterwards?

https://github.com/lsst/ctrl_mpexec/blob/8b60ee766868d154b29e9487d9d472f1bb34d2c1/python/lsst/ctrl/mpexec/cmdLineParser.py#L366 and doc around there was why I used -skip-existing together with extend-run. It seems convenient to think that -skip-existing could look for existing data in a specified collection or run, and then think of the run part separately? Originally I meant to store a quantum graph skipping the existing data, and run the quantum graph with a different run.

Show
Hsin-Fang Chiang added a comment - Somehow most of the failures I got recently happened halfway when jobs were writing out output files. So --skip-existing wouldn't be as useful if it can't handle partial outputs existing for a quantum. Both options seem fine to me. If we write the new outputs into a new run of the same collection, I assume both old and new runs of the collection will be butler-gettable afterwards? https://github.com/lsst/ctrl_mpexec/blob/8b60ee766868d154b29e9487d9d472f1bb34d2c1/python/lsst/ctrl/mpexec/cmdLineParser.py#L366 and doc around there was why I used - skip-existing together with extend-run . It seems convenient to think that -skip-existing could look for existing data in a specified collection or run, and then think of the run part separately? Originally I meant to store a quantum graph skipping the existing data, and run the quantum graph with a different run.
Hide
Tim Jenness added a comment -

Is this now moot with the new clobber-partial-outputs option from DM-26131 ?

I take it we still need to merge the fixes to the error reporting that Hsin-Fang Chiang has already made?

Show
Tim Jenness added a comment - Is this now moot with the new clobber-partial-outputs option from DM-26131 ? I take it we still need to merge the fixes to the error reporting that Hsin-Fang Chiang has already made?
Hide
Hsin-Fang Chiang added a comment -

Yes I think with DM-25818 and DM-26131 this is no longer a problem.

And the first error has been fixed in https://github.com/lsst/pipe_base/commit/88376a3730dc38df3fe46cfb70f2ee8abeced750

I'm closing this as "Done" then.

Show
Hsin-Fang Chiang added a comment - Yes I think with DM-25818 and DM-26131 this is no longer a problem. And the first error has been fixed in https://github.com/lsst/pipe_base/commit/88376a3730dc38df3fe46cfb70f2ee8abeced750 I'm closing this as "Done" then.

#### People

Assignee:
Hsin-Fang Chiang
Reporter:
Hsin-Fang Chiang
Watchers:
Hsin-Fang Chiang, Jim Bosch, Kian-Tat Lim, Tim Jenness
Votes:
0 Vote for this issue
Watchers:
4 Start watching this issue

#### Dates

Created:
Updated:
Resolved:

#### CI Builds

No builds found.