# Test dataset disassembly with ci_hsc_gen3

XMLWordPrintable

#### Details

• Type: Story
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
• Story Points:
2
• Team:
Architecture
• Urgent?:
No

#### Description

Composite disassembly is not properly tested anywhere for afw Exposures. Run ci_hsc_gen3 with disassembly turned on and see what breaks.

#### Activity

Hide
Tim Jenness added a comment - - edited

What breaks on first pass:

• postISRCcd uses Exposure storage class but Image and Mask have no entries in the formatter list.

 isr.fringe INFO: Filter not found in FringeTaskConfig.filters. Skipping fringe correction. isr INFO: Constructing Vignette polygon. isr INFO: Adding transmission curves. isr INFO: Set 234101 BAD pixels to 180.411636. isr INFO: Interpolating masked pixels. isr INFO: Setting rough magnitude zero point: 32.692803 isr INFO: Measuring background level. isr INFO: Flattened sky level: 180.404129 +/- 7.339000. isr INFO: Measuring sky levels in 8x16 grids: 180.464220. isr INFO: Sky flatness in 8x16 grids - pp: 0.069466 rms: 0.009497. afw.image.MaskedImageFitsReader WARN: Mask unreadable (FitsReader not initialized; desired HDU is probably missing.); using default Caught signal 11, backtrace follows: 0 libutils.dylib 0x0000000108b94b02 lsst::utils::(anonymous namespace)::signalHandler(int) + 82 1 libsystem_platform.dylib 0x00007fff6814f5fd (null) + 29 1 libsystem_platform.dylib 0x00007fff6814f5fd _sigtramp + 29 2 liblog.dylib 0x000000010f98ddc4 lsst::log::Log::log(log4cxx::helpers::ObjectPtrT, log4cxx::spi::LocationInfo const&, char const*, ...) + 452 3 libafw.dylib 0x0000000113243401 lsst::afw::image::Exposure lsst::afw::image::ExposureFitsReader::read(lsst::geom::Box2I const&, lsst::afw::image::ImageOrigin, bool, bool) + 33 4 readers.so 0x0000000117c2c608 (null)$_4NS_6objectEJRNS4_18ExposureFitsReaderERKNS2_4geom5Box2IENS4_11ImageOriginEbbS9_EJNS_4nameENS_9is_methodENS_7siblingENS_5arg_vESK_SK_SK_SK_EEEvOT_PFT0_DpT1_EDpRKT2_ENUlRNS_6detail13function_callEE_8__invokeESY_ + 1192 4 readers.so 0x0000000117c2c608 _ZZN8pybind1112cpp_function10initializeIZN4lsst3afw5image12_GLOBAL__N_125declareExposureFitsReaderERNS_6moduleEE3$_4NS_6objectEJRNS4_18ExposureFitsReaderERKNS2_4geom5Box2IENS4_11ImageOriginEbbS9_EJNS_4nameENS_9is_methodENS_7siblingENS_5arg_vESK_SK_SK_SK_EEEvOT_PFT0_DpT1_EDpRKT2_ENUlRNS_6detail13function_callEE_8__invokeESY_ + 1192 5 readers.so 0x0000000117bfcee9 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) + 3865 

Show
Tim Jenness added a comment - - edited What breaks on first pass: postISRCcd uses Exposure storage class but Image and Mask have no entries in the formatter list. When running the pipeline afw reader segvs when reading the mask. isr.fringe INFO: Filter not found in FringeTaskConfig.filters. Skipping fringe correction. isr INFO: Constructing Vignette polygon. isr INFO: Adding transmission curves. isr INFO: Set 234101 BAD pixels to 180.411636. isr INFO: Interpolating masked pixels. isr INFO: Setting rough magnitude zero point: 32.692803 isr INFO: Measuring background level. isr INFO: Flattened sky level: 180.404129 +/- 7.339000. isr INFO: Measuring sky levels in 8x16 grids: 180.464220. isr INFO: Sky flatness in 8x16 grids - pp: 0.069466 rms: 0.009497. afw.image.MaskedImageFitsReader WARN: Mask unreadable (FitsReader not initialized; desired HDU is probably missing.); using default Caught signal 11, backtrace follows: 0 libutils.dylib 0x0000000108b94b02 lsst::utils::(anonymous namespace)::signalHandler(int) + 82 1 libsystem_platform.dylib 0x00007fff6814f5fd (null) + 29 1 libsystem_platform.dylib 0x00007fff6814f5fd _sigtramp + 29 2 liblog.dylib 0x000000010f98ddc4 lsst::log::Log::log(log4cxx::helpers::ObjectPtrT<log4cxx::Level>, log4cxx::spi::LocationInfo const&, char const*, ...) + 452 3 libafw.dylib 0x0000000113243401 lsst::afw::image::Exposure<float, int, float> lsst::afw::image::ExposureFitsReader::read<float, int, float>(lsst::geom::Box2I const&, lsst::afw::image::ImageOrigin, bool, bool) + 33 4 readers.so 0x0000000117c2c608 (null)$_4NS_6objectEJRNS4_18ExposureFitsReaderERKNS2_4geom5Box2IENS4_11ImageOriginEbbS9_EJNS_4nameENS_9is_methodENS_7siblingENS_5arg_vESK_SK_SK_SK_EEEvOT_PFT0_DpT1_EDpRKT2_ENUlRNS_6detail13function_callEE_8__invokeESY_ + 1192 4 readers.so 0x0000000117c2c608 _ZZN8pybind1112cpp_function10initializeIZN4lsst3afw5image12_GLOBAL__N_125declareExposureFitsReaderERNS_6moduleEE3$_4NS_6objectEJRNS4_18ExposureFitsReaderERKNS2_4geom5Box2IENS4_11ImageOriginEbbS9_EJNS_4nameENS_9is_methodENS_7siblingENS_5arg_vESK_SK_SK_SK_EEEvOT_PFT0_DpT1_EDpRKT2_ENUlRNS_6detail13function_callEE_8__invokeESY_ + 1192 5 readers.so 0x0000000117bfcee9 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) + 3865
Hide
Tim Jenness added a comment -

As an aside, Andy Salnikov when the python processes crashed somehow multiprocessing never really noticed and the error messages continued to flood out even though I only had -j3. Should python multiprocessing spot if the subprocess crashes?

Show
Tim Jenness added a comment - As an aside, Andy Salnikov when the python processes crashed somehow multiprocessing never really noticed and the error messages continued to flood out even though I only had -j3. Should python multiprocessing spot if the subprocess crashes?
Hide
Andy Salnikov added a comment -

In current implementation it does not stop immediately, I think I remember we wanted to finish as many quanta as possible even if one/few of them fail. pipetask should fail eventually when one of the subprocesses fails, current logic is probably too trivial to handle all possible cases or even to have any guarantees as to when thing will fail, we probably need to improve it.

Show
Andy Salnikov added a comment - In current implementation it does not stop immediately, I think I remember we wanted to finish as many quanta as possible even if one/few of them fail. pipetask should fail eventually when one of the subprocesses fails, current logic is probably too trivial to handle all possible cases or even to have any guarantees as to when thing will fail, we probably need to improve it.
Hide
Andy Salnikov added a comment -

And I think it's even more complicated (just like anything in Python multiprocessing). I did a quick test and it looks like subprocesses are not even allowed to crash, it confuses multiprocessing completely. Tim Jenness, what happened to your job, did it stop eventually or did you have to kill it?

Show
Andy Salnikov added a comment - And I think it's even more complicated (just like anything in Python multiprocessing ). I did a quick test and it looks like subprocesses are not even allowed to crash, it confuses multiprocessing completely. Tim Jenness , what happened to your job, did it stop eventually or did you have to kill it?
Hide
Tim Jenness added a comment -

I think it hung up at the end after all the quanta were processed and I had to kill it

Yes, segv is probably bad. I haven't worked out why the segv is happening yet since if I try to read the file I get the error about the bad file HDU but it doesn't crash. I'm adding more logging.

Show
Tim Jenness added a comment - I think it hung up at the end after all the quanta were processed and I had to kill it Yes, segv is probably bad. I haven't worked out why the segv is happening yet since if I try to read the file I get the error about the bad file HDU but it doesn't crash. I'm adding more logging.
Hide
Andy Salnikov added a comment -

OK, that is consistent with what I expected in case of crash. I do not know yet how to workaround that (and I think we should), I guess I need a new ticket for that.

Show
Andy Salnikov added a comment - OK, that is consistent with what I expected in case of crash. I do not know yet how to workaround that (and I think we should), I guess I need a new ticket for that.
Hide
Tim Jenness added a comment -

Simon Krughoff would you be able to review this ticket? It's mostly some cleanups of the FITS Exposure formatter. There is one fix to butler itself to handle parameters of disassembled parameters: I was losing some parameters rather than letting each formatter process parameters and removing those from the initial list.

Show
Tim Jenness added a comment - Simon Krughoff would you be able to review this ticket? It's mostly some cleanups of the FITS Exposure formatter. There is one fix to butler itself to handle parameters of disassembled parameters: I was losing some parameters rather than letting each formatter process parameters and removing those from the initial list.
Hide
Tim Jenness added a comment -

All of the changes on this ticket are good and should be reviewed and ci_hsc_gen3 does complete with them. A fundamental problem though is that when multiprocessing is enabled (with the -j option) it breaks because the FILTER singleton is not initialized. Normally this is done when an Instrument is instantiated but in multi processing this never happens.

The error is:

 Traceback (most recent call last):  File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/miniconda/envs/lsst-scipipe-973126a/lib/python3.7/multiprocessing/pool.py", line 121, in worker  result = (True, func(*args, **kwds))  File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/stack/973126a/DarwinX86/ctrl_mpexec/20.0.0-7-g518c986+1/python/lsst/ctrl/mpexec/mpGraphExecutor.py", line 208, in _executePipelineTask  return executor.execute(taskDef, quantum, butler)  File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/stack/973126a/DarwinX86/ctrl_mpexec/20.0.0-7-g518c986+1/python/lsst/ctrl/mpexec/singleQuantumExecutor.py", line 82, in execute  self.runQuantum(task, quantum, taskDef, butler)  File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/stack/973126a/DarwinX86/ctrl_mpexec/20.0.0-7-g518c986+1/python/lsst/ctrl/mpexec/singleQuantumExecutor.py", line 224, in runQuantum  task.runQuantum(butlerQC, inputRefs, outputRefs)  File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/stack/973126a/DarwinX86/pipe_tasks/20.0.0-12-g2a9f6943+2/python/lsst/pipe/tasks/calibrate.py", line 621, in runQuantum  outputs = self.run(**inputs)  File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/stack/973126a/DarwinX86/pipe_tasks/20.0.0-12-g2a9f6943+2/python/lsst/pipe/tasks/calibrate.py", line 721, in run  sourceCat=sourceCat,  File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/stack/973126a/DarwinX86/pipe_base/20.0.0-6-g9c77118/python/lsst/pipe/base/timer.py", line 150, in wrapper  res = func(self, *args, **keyArgs)  File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/stack/973126a/DarwinX86/meas_astrom/20.0.0-1-gc96f8cb+5/python/lsst/meas/astrom/astrometry.py", line 152, in run  res = self.solve(exposure=exposure, sourceCat=sourceCat)  File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/stack/973126a/DarwinX86/pipe_base/20.0.0-6-g9c77118/python/lsst/pipe/base/timer.py", line 150, in wrapper  res = func(self, *args, **keyArgs)  File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/stack/973126a/DarwinX86/meas_astrom/20.0.0-1-gc96f8cb+5/python/lsst/meas/astrom/astrometry.py", line 196, in solve  epoch=expMd.epoch,  File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/stack/973126a/DarwinX86/meas_algorithms/20.0.0-4-g085c40a3+2/python/lsst/meas/algorithms/loadReferenceObjects.py", line 324, in loadPixelBox  return self.loadRegion(outerSkyRegion, filtFunc=_filterFunction, epoch=epoch, filterName=filterName)  File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/stack/973126a/DarwinX86/meas_algorithms/20.0.0-4-g085c40a3+2/python/lsst/meas/algorithms/loadReferenceObjects.py", line 434, in loadRegion  fluxField = getRefFluxField(schema=expandedCat.schema, filterName=filterName)  File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/stack/973126a/DarwinX86/meas_algorithms/20.0.0-4-g085c40a3+2/python/lsst/meas/algorithms/loadReferenceObjects.py", line 729, in getRefFluxField  raise RuntimeError("Could not find flux field(s) %s" % (", ".join(fluxFieldList))) RuntimeError: Could not find flux field(s) camFlux """ The above exception was the direct cause of the following exception: Traceback (most recent call last):  File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/stack/973126a/DarwinX86/ctrl_mpexec/20.0.0-7-g518c986+1/bin/pipetask", line 26, in   sys.exit(CmdLineFwk().parseAndRun())  File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/stack/973126a/DarwinX86/ctrl_mpexec/20.0.0-7-g518c986+1/python/lsst/ctrl/mpexec/cmdLineFwk.py", line 494, in parseAndRun  return self.runPipeline(qgraph, taskFactory, args)  File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/stack/973126a/DarwinX86/ctrl_mpexec/20.0.0-7-g518c986+1/python/lsst/ctrl/mpexec/cmdLineFwk.py", line 694, in runPipeline  executor.execute(graph, butler)  File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/stack/973126a/DarwinX86/ctrl_mpexec/20.0.0-7-g518c986+1/python/lsst/ctrl/mpexec/mpGraphExecutor.py", line 71, in execute  self._executeQuantaMP(quantaIter, butler)  File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/stack/973126a/DarwinX86/ctrl_mpexec/20.0.0-7-g518c986+1/python/lsst/ctrl/mpexec/mpGraphExecutor.py", line 175, in _executeQuantaMP  results[dep].get(self.timeout)  File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/miniconda/envs/lsst-scipipe-973126a/lib/python3.7/multiprocessing/pool.py", line 657, in get  raise self._value RuntimeError: Could not find flux field(s) camFlux 

and in single process mode the filter yaml look like:

 aliases: - W-S-R+ - HSC-R canonicalName: r name: r properties:  lambdaEff: 623.0  lambdaMax: .nan  lambdaMin: .nan 

but in multiprocessing they look like:

 aliases: [] canonicalName: _unknown_ name: r 

Show
Hide
Tim Jenness added a comment -

Follow up on this, if I clone the Exposure read/write behavior with Filter (by using the FILTER keyword and ignoring what happens to be in Filter) everything works. This raises two issues:

1. The code in meas_astrom using HSC-R works but using r does not. I assume this is what we expect (John Parejko?)
2. There is a lot of metadata manipulation in Exposure.writeFits that is not reproducible if each component in Exposure is written out separately. In many cases this does not matter because we are explicitly calling writeFits on each component so we don't need to rely on storing the values in metadata and then regenerating them from metadata. Somewhere in the pipeline the Filter is set to r but never gets over-ridden by the FILTER header value.

I'm not sure how much of this is a problem. For now I will recreate how FILTER is handled in Exposure.writeFits inside ExposureAssembler. That can be removed when DM-26181 can be relied upon.

Show
Tim Jenness added a comment - Follow up on this, if I clone the Exposure read/write behavior with Filter (by using the FILTER keyword and ignoring what happens to be in Filter) everything works. This raises two issues: The code in meas_astrom using HSC-R works but using r does not. I assume this is what we expect ( John Parejko ?) There is a lot of metadata manipulation in Exposure.writeFits that is not reproducible if each component in Exposure is written out separately. In many cases this does not matter because we are explicitly calling writeFits on each component so we don't need to rely on storing the values in metadata and then regenerating them from metadata. Somewhere in the pipeline the Filter is set to r but never gets over-ridden by the FILTER header value. I'm not sure how much of this is a problem. For now I will recreate how FILTER is handled in Exposure.writeFits inside ExposureAssembler. That can be removed when DM-26181 can be relied upon.
Hide
John Parejko added a comment -

I don't understand your point 1) above. "using HSC-R" - using it how and where?

Show
John Parejko added a comment - I don't understand your point 1) above. "using HSC-R" - using it how and where?
Hide
Tim Jenness added a comment -

John Parejko I'm sorry. See the stack trace in an early comment involving meas_astrom and filter determination. Whatever code was using the results of that meas_astrom filter determination failed when it thought the filter was "r" but worked fine when it thought it was HSC-R even without the filters for HSC being registered. It seemed like a part of the code you were familiar with. I don't know exactly which part of the pipeline was using it but it was reading the filter from an icExp.

Show
Tim Jenness added a comment - John Parejko I'm sorry. See the stack trace in an early comment involving meas_astrom and filter determination. Whatever code was using the results of that meas_astrom filter determination failed when it thought the filter was "r" but worked fine when it thought it was HSC-R even without the filters for HSC being registered. It seemed like a part of the code you were familiar with. I don't know exactly which part of the pipeline was using it but it was reading the filter from an icExp.
Hide
Simon Krughoff added a comment -

Looks good.

Show
Simon Krughoff added a comment - Looks good.

#### People

Assignee:
Tim Jenness
Reporter:
Tim Jenness
Reviewers:
Simon Krughoff
Watchers:
Andy Salnikov, Jim Bosch, John Parejko, Michelle Gower, Simon Krughoff, Tim Jenness