Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-26181

Ensure that filters are defined in pipetask multiprocessing

    XMLWordPrintable

    Details

    • Story Points:
      1
    • Sprint:
      DB_F20_06
    • Team:
      Data Access and Database
    • Urgent?:
      No

      Description

      In DM-26119 I learned that when pipetask uses multiprocessing we never instantiate an Instrument and therefore we never define the filters for that instrument in the global singleton.

      In single process mode it's fine because at some point an Instrument is created.

      Modify pipetask multiprocessing such that the dataIds are scanned for the "instrument" dimension and we call Instrument.fromName(dataId["instrument"], registry). Currently we only expect one instrument.

      When the singleton is removed it's likely that some related initialization will be needed to register the filters but we assume that would allow the same initialization for multiple instruments.

      CC/ Krzysztof Findeisen, John Parejko in case they have come across this problem before.

        Attachments

          Issue Links

            Activity

            Hide
            erykoff Eli Rykoff added a comment -

            I want to add that it's not just in multiprocessing that the Instrument isn't instantiated. I've noticed that when running my in-progress fgcmcal pipetask that I have to instantiate the Instrument manually. So in a single process for some tasks, the Instrument is created somewhere, incidentally, but not through any explicit pattern. Should this be explicitly done somewhere in pipetask as well/instead of the multiprocessing run?

            Show
            erykoff Eli Rykoff added a comment - I want to add that it's not just in multiprocessing that the Instrument isn't instantiated. I've noticed that when running my in-progress fgcmcal pipetask that I have to instantiate the Instrument manually. So in a single process for some tasks, the Instrument is created somewhere, incidentally, but not through any explicit pattern. Should this be explicitly done somewhere in pipetask as well/instead of the multiprocessing run?
            Hide
            salnikov Andy Salnikov added a comment -

            Tim Jenness, is it possible to reproduce that crash you saw on DM-26119? I probably don't want to write a unit test for that but I need a way to test that fix is going to work.

            Show
            salnikov Andy Salnikov added a comment - Tim Jenness , is it possible to reproduce that crash you saw on DM-26119 ? I probably don't want to write a unit test for that but I need a way to test that fix is going to work.
            Hide
            salnikov Andy Salnikov added a comment -

            Also what is that "filter yaml" that you mentioned there, where do I get it from?

            Show
            salnikov Andy Salnikov added a comment - Also what is that "filter yaml" that you mentioned there, where do I get it from?
            Hide
            tjenness Tim Jenness added a comment -

            You need to be using DM-26119 branches for obs_base and daf_butler. Then you need to run the ci_hsc_gen3 test with this override yaml:

            datastore:
              composites:
                default: true
                disassembled:
                  ExposureI: true
                  ExposureF: true
                  calexp: true
            

            Using python $(cmd -v scons) --butler-config=override.yaml -j3. (or scons --butler-config=override.yaml -j3. When you do this you will get lots of disassembled exposures in the DATA/shared/ci_hsc_output/YYYYMMDD.../ and some of them will be directories named datasetType.filter. In those filter YAML files if you see unknown at all it's not working. If you see the rich content with aliases and canonical names then it's working.

            Unfortunately my ticket branch has a hack on it that forces the filter name to be HSC-R regardless and that means the code runs to completion. It doesn't matter though because after a few minutes you'll see some .filter yaml files and can inspect them.

            Show
            tjenness Tim Jenness added a comment - You need to be using DM-26119 branches for obs_base and daf_butler. Then you need to run the ci_hsc_gen3 test with this override yaml: datastore: composites: default: true disassembled: ExposureI: true ExposureF: true calexp: true Using python $(cmd -v scons) --butler-config=override.yaml -j3 . (or scons --butler-config=override.yaml -j3 . When you do this you will get lots of disassembled exposures in the DATA/shared/ci_hsc_output/YYYYMMDD.../ and some of them will be directories named datasetType.filter. In those filter YAML files if you see unknown at all it's not working. If you see the rich content with aliases and canonical names then it's working. Unfortunately my ticket branch has a hack on it that forces the filter name to be HSC-R regardless and that means the code runs to completion. It doesn't matter though because after a few minutes you'll see some .filter yaml files and can inspect them.
            Hide
            salnikov Andy Salnikov added a comment -

            My trivial fix works for most of datasets, but for two of them (out 100+) I'm still getting unknown in filter yaml files:

             
            deepCoadd_calexp.filter/0/69/r/deepCoadd_calexp_filter_0_69_r_discrete_ci_hsc_shared_ci_hsc_output_20200731T16h32m32s.yaml
            2:canonicalName: _unknown_
             
            deepCoadd_calexp.filter/0/69/i/deepCoadd_calexp_filter_0_69_i_discrete_ci_hsc_shared_ci_hsc_output_20200731T16h32m32s.yaml
            2:canonicalName: _unknown_
             

            Tim Jenness, do you know what could be special about these two or where should I start digging?

             

            Show
            salnikov Andy Salnikov added a comment - My trivial fix works for most of datasets, but for two of them (out 100+) I'm still getting unknown in filter yaml files:   deepCoadd_calexp.filter/0/69/r/deepCoadd_calexp_filter_0_69_r_discrete_ci_hsc_shared_ci_hsc_output_20200731T16h32m32s.yaml 2:canonicalName: _unknown_   deepCoadd_calexp.filter/0/69/i/deepCoadd_calexp_filter_0_69_i_discrete_ci_hsc_shared_ci_hsc_output_20200731T16h32m32s.yaml 2:canonicalName: _unknown_ Tim Jenness , do you know what could be special about these two or where should I start digging?  
            Hide
            salnikov Andy Salnikov added a comment -

            It looks like these were made by DetectCoaddSourcesTask, and looking at that task connections I do not see any instrument in its dimensions, but there is an abstract_filter.

            Show
            salnikov Andy Salnikov added a comment - It looks like these were made by DetectCoaddSourcesTask , and looking at that task connections I do not see any instrument in its dimensions, but there is an abstract_filter .
            Hide
            tjenness Tim Jenness added a comment -

            Yes, that is probably expected. I don't think there is anything we can do about it at the moment. I think what you have done will make a huge improvement to consistency of execution.

            Show
            tjenness Tim Jenness added a comment - Yes, that is probably expected. I don't think there is anything we can do about it at the moment. I think what you have done will make a huge improvement to consistency of execution.
            Hide
            salnikov Andy Salnikov added a comment -

            OK, thanks, I think it is ready for review then. Jenkins has just started, I'll wait until it finishes.

            Eli Rykoff This should fix both single- and multi-process cases.

            Show
            salnikov Andy Salnikov added a comment - OK, thanks, I think it is ready for review then. Jenkins has just started, I'll wait until it finishes. Eli Rykoff This should fix both single- and multi-process cases.
            Hide
            salnikov Andy Salnikov added a comment -

            JIRA is slow as usual, PR is here: https://github.com/lsst/ctrl_mpexec/pull/62

            Show
            salnikov Andy Salnikov added a comment - JIRA is slow as usual, PR is here: https://github.com/lsst/ctrl_mpexec/pull/62
            Hide
            tjenness Tim Jenness added a comment -

            Looks great.

            Show
            tjenness Tim Jenness added a comment - Looks great.
            Hide
            salnikov Andy Salnikov added a comment -

            Thanks, I added assert message, and I also had to add a (ugly) monkey-patch to unit test, I missed unit test failure in previous commit somehow (but Jenkins alerted me). You can check PR again to see if it is OK with you.

            Show
            salnikov Andy Salnikov added a comment - Thanks, I added assert message, and I also had to add a (ugly) monkey-patch to unit test, I missed unit test failure in previous commit somehow (but Jenkins alerted me). You can check PR again to see if it is OK with you.

              People

              Assignee:
              salnikov Andy Salnikov
              Reporter:
              tjenness Tim Jenness
              Reviewers:
              Tim Jenness
              Watchers:
              Andy Salnikov, Eli Rykoff, John Parejko, Krzysztof Findeisen, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.