Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-7134

singleFrameDriver is only running with a single process

    Details

    • Type: Bug
    • Status: Done
    • Priority: Major
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: pipe_drivers
    • Labels:
      None
    • Templates:
    • Story Points:
      0.5
    • Epic Link:
    • Sprint:
      DRP F16-3
    • Team:
      Data Release Production

      Description

      singleFrameDriver.py is only using a single process. The problem appears to be the change to use a ButlerInitializedTaskRunner, which doesn't inherit from BatchTaskRunner.

        Issue Links

          Activity

          Hide
          price Paul Price added a comment -

          Jim Bosch, would you please review this? Here's the patch:

          pprice@tiger-sumire:~/LSST/pipe/drivers (tickets/DM-7134=) $ git sub-patch
          commit a04ae120c7ae27811bebcf7e2b4036ea8c6c039c
          Author: Paul Price <price@astro.princeton.edu>
          Date:   Thu Aug 4 15:51:41 2016 -0400
           
              singleFrameDriver: restore lost parallelism
              
              The ButlerInitializedTaskRunner used to run the SingleFrameDriverTask
              doesn't inherit from the BatchTaskRunner, which is what provides
              parallelism. Made a new SingleFrameTaskRunner with the characteristics
              of both BatchTaskRunner and ButlerInitializedTaskRunner using
              multiple inheritance (safe without further work because the
              specialisations in those TaskRunners are orthogonal).
           
          diff --git a/python/lsst/pipe/drivers/singleFrameDriver.py b/python/lsst/pipe/drivers/singleFrameDriver.py
          index b13f64e..60257d2 100644
          --- a/python/lsst/pipe/drivers/singleFrameDriver.py
          +++ b/python/lsst/pipe/drivers/singleFrameDriver.py
          @@ -1,7 +1,7 @@
           from lsst.pipe.base import ArgumentParser, ButlerInitializedTaskRunner
           from lsst.pipe.tasks.processCcd import ProcessCcdTask
           from lsst.pex.config import Config, Field, ConfigurableField, ListField
          -from lsst.ctrl.pool.parallel import BatchParallelTask
          +from lsst.ctrl.pool.parallel import BatchParallelTask, BatchTaskRunner
           
           class SingleFrameDriverConfig(Config):
               processCcd = ConfigurableField(target=ProcessCcdTask, doc="CCD processing task")
          @@ -9,12 +9,17 @@ class SingleFrameDriverConfig(Config):
               ccdKey = Field(dtype=str, default="ccd", doc="DataId key corresponding to a single sensor")
           
           
          +class SingleFrameTaskRunner(BatchTaskRunner, ButlerInitializedTaskRunner):
          +    """Run batches, and initialize Task using a butler"""
          +    pass
          +
          +
           class SingleFrameDriverTask(BatchParallelTask):
               """Process CCDs in parallel
               """
               ConfigClass = SingleFrameDriverConfig
               _DefaultName = "singleFrameDriver"
          -    RunnerClass = ButlerInitializedTaskRunner
          +    RunnerClass = SingleFrameTaskRunner
           
               def __init__(self, butler=None, refObjLoader=None, *args, **kwargs):
                   """!
          

          Show
          price Paul Price added a comment - Jim Bosch , would you please review this? Here's the patch: pprice@tiger-sumire:~/LSST/pipe/drivers (tickets/DM-7134=) $ git sub-patch commit a04ae120c7ae27811bebcf7e2b4036ea8c6c039c Author: Paul Price <price@astro.princeton.edu> Date: Thu Aug 4 15:51:41 2016 -0400   singleFrameDriver: restore lost parallelism The ButlerInitializedTaskRunner used to run the SingleFrameDriverTask doesn't inherit from the BatchTaskRunner, which is what provides parallelism. Made a new SingleFrameTaskRunner with the characteristics of both BatchTaskRunner and ButlerInitializedTaskRunner using multiple inheritance (safe without further work because the specialisations in those TaskRunners are orthogonal).   diff --git a/python/lsst/pipe/drivers/singleFrameDriver.py b/python/lsst/pipe/drivers/singleFrameDriver.py index b13f64e..60257d2 100644 --- a/python/lsst/pipe/drivers/singleFrameDriver.py +++ b/python/lsst/pipe/drivers/singleFrameDriver.py @@ -1,7 +1,7 @@ from lsst.pipe.base import ArgumentParser, ButlerInitializedTaskRunner from lsst.pipe.tasks.processCcd import ProcessCcdTask from lsst.pex.config import Config, Field, ConfigurableField, ListField -from lsst.ctrl.pool.parallel import BatchParallelTask +from lsst.ctrl.pool.parallel import BatchParallelTask, BatchTaskRunner class SingleFrameDriverConfig(Config): processCcd = ConfigurableField(target=ProcessCcdTask, doc="CCD processing task") @@ -9,12 +9,17 @@ class SingleFrameDriverConfig(Config): ccdKey = Field(dtype=str, default="ccd", doc="DataId key corresponding to a single sensor") +class SingleFrameTaskRunner(BatchTaskRunner, ButlerInitializedTaskRunner): + """Run batches, and initialize Task using a butler""" + pass + + class SingleFrameDriverTask(BatchParallelTask): """Process CCDs in parallel """ ConfigClass = SingleFrameDriverConfig _DefaultName = "singleFrameDriver" - RunnerClass = ButlerInitializedTaskRunner + RunnerClass = SingleFrameTaskRunner def __init__(self, butler=None, refObjLoader=None, *args, **kwargs): """!
          Hide
          jbosch Jim Bosch added a comment - - edited

          My only concern is that I think you might need to modify BatchTaskRunner.__init__ to use super to ensure that TaskRunner.__init__ is invoked properly; it'd certainly be good practice to do so, even if we're skating by on an edge case. In fact, I'd feel a bit safer if all three derived classes explicitly did that, but I've got to imagine Python also handles multiple inheritance correctly if you don't define __init__ at all.

          Show
          jbosch Jim Bosch added a comment - - edited My only concern is that I think you might need to modify BatchTaskRunner.__init__ to use super to ensure that TaskRunner.__init__ is invoked properly; it'd certainly be good practice to do so, even if we're skating by on an edge case. In fact, I'd feel a bit safer if all three derived classes explicitly did that, but I've got to imagine Python also handles multiple inheritance correctly if you don't define __init__ at all.
          Hide
          price Paul Price added a comment -

          I don't think that's necessary.

          • BatchTaskRunner overrides __init__, run and __call__.
          • ButlerInitializedTaskRunner overrides only makeTask.

          These lists are orthogonal, so the correct parent should be firing without having to worry about the MRO. Since you mentioned it explictly, take the example of __init__:

          • SingleFrameTaskRunner.__init__ isn't defined, so we go looking up the MRO.
          • BatchTaskRunner is next in the MRO, and it defines __init__, so that fires.
          • That explicitly calls TaskRunner.__init__ (not using super).
          • That doesn't call any other __init__ method.
            Note that nothing is calling ButlerInitializedTaskRunner.__init__, and that's good because it doesn't override that method. So I think everything fires that needs to, and nothing fires that shouldn't. If we used super somewhere but not everywhere, that would create trouble, but I think this does the right thing now without any changes required.

          Please let me know if you disagree or still have concerns. I'll wait on your final OK before merging.

          Show
          price Paul Price added a comment - I don't think that's necessary. BatchTaskRunner overrides __init__ , run and __call__ . ButlerInitializedTaskRunner overrides only makeTask . These lists are orthogonal, so the correct parent should be firing without having to worry about the MRO. Since you mentioned it explictly, take the example of __init__ : SingleFrameTaskRunner.__init__ isn't defined, so we go looking up the MRO. BatchTaskRunner is next in the MRO, and it defines __init__ , so that fires. That explicitly calls TaskRunner.__init__ (not using super ). That doesn't call any other __init__ method. Note that nothing is calling ButlerInitializedTaskRunner.__init__ , and that's good because it doesn't override that method. So I think everything fires that needs to, and nothing fires that shouldn't. If we used super somewhere but not everywhere, that would create trouble, but I think this does the right thing now without any changes required. Please let me know if you disagree or still have concerns. I'll wait on your final OK before merging.
          Hide
          jbosch Jim Bosch added a comment -

          I don't disagree with your analysis that the current code is safe, but I still worry that it's fragile w.r.t. the addition of new subclasses. That said, TaskRunner is perhaps not long for this world with SuperTask on the horizon, so it's probably not necessary to fix this now.

          Show
          jbosch Jim Bosch added a comment - I don't disagree with your analysis that the current code is safe, but I still worry that it's fragile w.r.t. the addition of new subclasses. That said, TaskRunner is perhaps not long for this world with SuperTask on the horizon, so it's probably not necessary to fix this now.
          Hide
          price Paul Price added a comment -

          Merged to master.

          Show
          price Paul Price added a comment - Merged to master.
          Hide
          price Paul Price added a comment -

          Thanks, I'll go ahead and merge now.

          I'm not sure how this is any more fragile than usual, nor how super would fix it (using super in a class hierarchy that doesn't all use super is dangerous, and I thought you were against using it for that reason); perhaps you could explain that offline for me?

          Show
          price Paul Price added a comment - Thanks, I'll go ahead and merge now. I'm not sure how this is any more fragile than usual, nor how super would fix it (using super in a class hierarchy that doesn't all use super is dangerous, and I thought you were against using it for that reason); perhaps you could explain that offline for me?
          Hide
          jbosch Jim Bosch added a comment -

          To be honest, while I do remember being unhappy with super in the past, I don't remember the arguments for and against anymore, and I don't think that unease extended to diamond inheritance when all classes have a nontrivial constructors; I think super is till the only way to handle that situation correctly.

          I wonder what the situation is in Python 3 (or futurized Python 2), where I hear super is better - Tim Jenness?

          Show
          jbosch Jim Bosch added a comment - To be honest, while I do remember being unhappy with super in the past, I don't remember the arguments for and against anymore, and I don't think that unease extended to diamond inheritance when all classes have a nontrivial constructors; I think super is till the only way to handle that situation correctly. I wonder what the situation is in Python 3 (or futurized Python 2), where I hear super is better - Tim Jenness ?
          Show
          tjenness Tim Jenness added a comment - - edited Python 3 super docs are at https://docs.python.org/3/library/functions.html?highlight=super#super which then tells you to read https://rhettinger.wordpress.com/2011/05/26/super-considered-super/

            People

            • Assignee:
              price Paul Price
              Reporter:
              price Paul Price
              Reviewers:
              Jim Bosch
              Watchers:
              Jim Bosch, Paul Price, Tim Jenness
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development

                  Agile