# Ensure that filters are defined in pipetask multiprocessing

XMLWordPrintable

#### Details

• Type: Story
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
• Story Points:
1
• Sprint:
DB_F20_06
• Team:
Data Access and Database
• Urgent?:
No

#### Description

In DM-26119 I learned that when pipetask uses multiprocessing we never instantiate an Instrument and therefore we never define the filters for that instrument in the global singleton.

In single process mode it's fine because at some point an Instrument is created.

Modify pipetask multiprocessing such that the dataIds are scanned for the "instrument" dimension and we call Instrument.fromName(dataId["instrument"], registry). Currently we only expect one instrument.

When the singleton is removed it's likely that some related initialization will be needed to register the filters but we assume that would allow the same initialization for multiple instruments.

CC/ Krzysztof Findeisen, John Parejko in case they have come across this problem before.

#### Activity

No builds found.
Tim Jenness created issue -
Field Original Value New Value
Link This issue is triggered by DM-26119 [ DM-26119 ]
 Link This issue relates to DM-26069 [ DM-26069 ]
 Link This issue relates to RFC-624 [ RFC-624 ]
Hide
Eli Rykoff added a comment -

I want to add that it's not just in multiprocessing that the Instrument isn't instantiated. I've noticed that when running my in-progress fgcmcal pipetask that I have to instantiate the Instrument manually. So in a single process for some tasks, the Instrument is created somewhere, incidentally, but not through any explicit pattern. Should this be explicitly done somewhere in pipetask as well/instead of the multiprocessing run?

Show
Eli Rykoff added a comment - I want to add that it's not just in multiprocessing that the Instrument isn't instantiated. I've noticed that when running my in-progress fgcmcal pipetask that I have to instantiate the Instrument manually. So in a single process for some tasks, the Instrument is created somewhere, incidentally, but not through any explicit pattern. Should this be explicitly done somewhere in pipetask as well/instead of the multiprocessing run?
 Status To Do [ 10001 ] In Progress [ 3 ]
Hide
Andy Salnikov added a comment -

Tim Jenness, is it possible to reproduce that crash you saw on DM-26119? I probably don't want to write a unit test for that but I need a way to test that fix is going to work.

Show
Andy Salnikov added a comment - Tim Jenness , is it possible to reproduce that crash you saw on DM-26119 ? I probably don't want to write a unit test for that but I need a way to test that fix is going to work.
Hide
Andy Salnikov added a comment -

Also what is that "filter yaml" that you mentioned there, where do I get it from?

Show
Andy Salnikov added a comment - Also what is that "filter yaml" that you mentioned there, where do I get it from?
Hide
Tim Jenness added a comment -

You need to be using DM-26119 branches for obs_base and daf_butler. Then you need to run the ci_hsc_gen3 test with this override yaml:

 datastore:  composites:  default: true  disassembled:  ExposureI: true  ExposureF: true  calexp: true 

Using python $(cmd -v scons) --butler-config=override.yaml -j3. (or scons --butler-config=override.yaml -j3. When you do this you will get lots of disassembled exposures in the DATA/shared/ci_hsc_output/YYYYMMDD.../ and some of them will be directories named datasetType.filter. In those filter YAML files if you see unknown at all it's not working. If you see the rich content with aliases and canonical names then it's working. Unfortunately my ticket branch has a hack on it that forces the filter name to be HSC-R regardless and that means the code runs to completion. It doesn't matter though because after a few minutes you'll see some .filter yaml files and can inspect them. Show Tim Jenness added a comment - You need to be using DM-26119 branches for obs_base and daf_butler. Then you need to run the ci_hsc_gen3 test with this override yaml: datastore: composites: default: true disassembled: ExposureI: true ExposureF: true calexp: true Using python$(cmd -v scons) --butler-config=override.yaml -j3 . (or scons --butler-config=override.yaml -j3 . When you do this you will get lots of disassembled exposures in the DATA/shared/ci_hsc_output/YYYYMMDD.../ and some of them will be directories named datasetType.filter. In those filter YAML files if you see unknown at all it's not working. If you see the rich content with aliases and canonical names then it's working. Unfortunately my ticket branch has a hack on it that forces the filter name to be HSC-R regardless and that means the code runs to completion. It doesn't matter though because after a few minutes you'll see some .filter yaml files and can inspect them.
Hide
Andy Salnikov added a comment -

My trivial fix works for most of datasets, but for two of them (out 100+) I'm still getting unknown in filter yaml files:

   deepCoadd_calexp.filter/0/69/r/deepCoadd_calexp_filter_0_69_r_discrete_ci_hsc_shared_ci_hsc_output_20200731T16h32m32s.yaml 2:canonicalName: _unknown_   deepCoadd_calexp.filter/0/69/i/deepCoadd_calexp_filter_0_69_i_discrete_ci_hsc_shared_ci_hsc_output_20200731T16h32m32s.yaml 2:canonicalName: _unknown_ 

Tim Jenness, do you know what could be special about these two or where should I start digging?

Show
Andy Salnikov added a comment - My trivial fix works for most of datasets, but for two of them (out 100+) I'm still getting unknown in filter yaml files:   deepCoadd_calexp.filter/0/69/r/deepCoadd_calexp_filter_0_69_r_discrete_ci_hsc_shared_ci_hsc_output_20200731T16h32m32s.yaml 2:canonicalName: _unknown_   deepCoadd_calexp.filter/0/69/i/deepCoadd_calexp_filter_0_69_i_discrete_ci_hsc_shared_ci_hsc_output_20200731T16h32m32s.yaml 2:canonicalName: _unknown_ Tim Jenness , do you know what could be special about these two or where should I start digging?
Hide
Andy Salnikov added a comment -

It looks like these were made by DetectCoaddSourcesTask, and looking at that task connections I do not see any instrument in its dimensions, but there is an abstract_filter.

Show
Andy Salnikov added a comment - It looks like these were made by DetectCoaddSourcesTask , and looking at that task connections I do not see any instrument in its dimensions, but there is an abstract_filter .
Hide
Tim Jenness added a comment -

Yes, that is probably expected. I don't think there is anything we can do about it at the moment. I think what you have done will make a huge improvement to consistency of execution.

Show
Tim Jenness added a comment - Yes, that is probably expected. I don't think there is anything we can do about it at the moment. I think what you have done will make a huge improvement to consistency of execution.
Hide
Andy Salnikov added a comment -

OK, thanks, I think it is ready for review then. Jenkins has just started, I'll wait until it finishes.

Eli Rykoff This should fix both single- and multi-process cases.

Show
Andy Salnikov added a comment - OK, thanks, I think it is ready for review then. Jenkins has just started, I'll wait until it finishes. Eli Rykoff This should fix both single- and multi-process cases.
 Reviewers Tim Jenness [ tjenness ] Status In Progress [ 3 ] In Review [ 10004 ]
Hide
Andy Salnikov added a comment -

JIRA is slow as usual, PR is here: https://github.com/lsst/ctrl_mpexec/pull/62

Show
Andy Salnikov added a comment - JIRA is slow as usual, PR is here: https://github.com/lsst/ctrl_mpexec/pull/62
 Epic Link DM-25244 [ 435560 ]
 Sprint DB_F20_06 [ 1026 ] Story Points 1
 Description In DM-26119 I learned that when pipetask uses multiprocessing we never instantiate an {{Instrument}} and therefore we never define the filters for that instrument in the global singleton. In single process mode it's fine because at some point an Instrument is created. Modify pipetask multiprocessing such that the dataIds are scanned for the "instrument" dimension and we call {{Instrument.getName(dataId["instrument"], registry)}}. Currently we only expect one instrument. When the singleton is removed it's likely that some related initialization will be needed to register the filters but we assume that would allow the same initialization for multiple instruments. CC/ [~krzys], [~Parejkoj] in case they have come across this problem before. In DM-26119 I learned that when pipetask uses multiprocessing we never instantiate an {{Instrument}} and therefore we never define the filters for that instrument in the global singleton. In single process mode it's fine because at some point an Instrument is created. Modify pipetask multiprocessing such that the dataIds are scanned for the "instrument" dimension and we call {{Instrument.fromName(dataId["instrument"], registry)}}. Currently we only expect one instrument. When the singleton is removed it's likely that some related initialization will be needed to register the filters but we assume that would allow the same initialization for multiple instruments. CC/ [~krzys], [~Parejkoj] in case they have come across this problem before.
Hide
Tim Jenness added a comment -

Looks great.

Show
Tim Jenness added a comment - Looks great.
 Status In Review [ 10004 ] Reviewed [ 10101 ]
Hide
Andy Salnikov added a comment -

Thanks, I added assert message, and I also had to add a (ugly) monkey-patch to unit test, I missed unit test failure in previous commit somehow (but Jenkins alerted me). You can check PR again to see if it is OK with you.

Show
Andy Salnikov added a comment - Thanks, I added assert message, and I also had to add a (ugly) monkey-patch to unit test, I missed unit test failure in previous commit somehow (but Jenkins alerted me). You can check PR again to see if it is OK with you.
 Resolution Done [ 10000 ] Status Reviewed [ 10101 ] Done [ 10002 ]

#### People

Assignee:
Andy Salnikov
Reporter:
Tim Jenness
Reviewers:
Tim Jenness
Watchers:
Andy Salnikov, Eli Rykoff, John Parejko, Krzysztof Findeisen, Tim Jenness