Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-10633

Increase CmdLineTask multiprocessing timeout

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: pipe_base
    • Labels:
      None
    • Story Points:
      1
    • Team:
      Data Release Production

      Description

      The timeout in CmdLineTask catches a lot of people unawares, and causes lots of confusion and annoyance. It's necessary (because of that python multiprocessing bug) to have some value, but as soon as we stick a value in people manage to exceed it. Merlin Fisher-Levine had a good idea: we make the value a number per target, which will make it harder to exceed when people throw lots of data at -j. That just means calling getTargetList a bit earlier.

        Attachments

          Issue Links

            Activity

            Hide
            price Paul Price added a comment -

            On Python 3.5.x, this change produces:

            OverflowError: timestamp too large to convert to C _PyTime_t
            

            https://bugs.python.org/issue25155

            Show
            price Paul Price added a comment - On Python 3.5.x, this change produces: OverflowError: timestamp too large to convert to C _PyTime_t https://bugs.python.org/issue25155
            Hide
            price Paul Price added a comment -

            Going to reduce the overkill factor.

            Show
            price Paul Price added a comment - Going to reduce the overkill factor.
            Hide
            price Paul Price added a comment -

            Sorry to bother you again, Russell Owen, but would you please have a look at these simple changes?

            price@pap-laptop:~/LSST/pipe_base (tickets/DM-10633=) $ git sub-patch
            commit 355fc941f07c724a4c25081f0abfa8bb661c97e6
            Author: Paul Price <price@astro.princeton.edu>
            Date:   Fri Jun 2 14:01:13 2017 -0400
             
                TaskRunner: revise insane timeout
                
                This value is more reasonable, and doesn't trigger Y2K38 bugs (e.g.,
                https://bugs.python.org/issue25155 ), but should be large enough to
                prevent annoyance.
             
            diff --git a/python/lsst/pipe/base/cmdLineTask.py b/python/lsst/pipe/base/cmdLineTask.py
            index cfc4d26..34d6174 100644
            --- a/python/lsst/pipe/base/cmdLineTask.py
            +++ b/python/lsst/pipe/base/cmdLineTask.py
            @@ -140,7 +140,7 @@ class TaskRunner(object):
                 [1] http://bugs.python.org/issue8296
                 [2] http://stackoverflow.com/questions/1408356/keyboard-interrupts-with-pythons-multiprocessing-pool)
                 """
            -    TIMEOUT = 999999999999  # Default timeout (sec) for multiprocessing
            +    TIMEOUT = 3600*24*30  # Default timeout (sec) for multiprocessing
             
                 def __init__(self, TaskClass, parsedCmd, doReturnResults=False):
                     """!Construct a TaskRunner
             
            commit b33ffe24b6716cee286993c3493f7da27d464296
            Author: Paul Price <price@astro.princeton.edu>
            Date:   Fri Jun 2 14:01:42 2017 -0400
             
                TaskRunner: fix doc
                
                We don't use TIMEOUT_DEFAULT, but TIMEOUT (there is no TIMEOUT_DEFAULT).
             
            diff --git a/python/lsst/pipe/base/cmdLineTask.py b/python/lsst/pipe/base/cmdLineTask.py
            index 34d6174..d5f93d1 100644
            --- a/python/lsst/pipe/base/cmdLineTask.py
            +++ b/python/lsst/pipe/base/cmdLineTask.py
            @@ -135,7 +135,7 @@ class TaskRunner(object):
                 Due to a python bug [1], handling a KeyboardInterrupt properly requires
                 specifying a timeout [2]. This timeout (in sec) can be specified as the
                 "timeout" element in the output from ArgumentParser (the "parsedCmd"), if
            -    available, otherwise we use TaskRunner.TIMEOUT_DEFAULT.
            +    available, otherwise we use TaskRunner.TIMEOUT.
             
                 [1] http://bugs.python.org/issue8296
                 [2] http://stackoverflow.com/questions/1408356/keyboard-interrupts-with-pythons-multiprocessing-pool)
            

            Show
            price Paul Price added a comment - Sorry to bother you again, Russell Owen , but would you please have a look at these simple changes? price@pap-laptop:~/LSST/pipe_base (tickets/DM-10633=) $ git sub-patch commit 355fc941f07c724a4c25081f0abfa8bb661c97e6 Author: Paul Price <price@astro.princeton.edu> Date: Fri Jun 2 14:01:13 2017 -0400   TaskRunner: revise insane timeout This value is more reasonable, and doesn't trigger Y2K38 bugs (e.g., https://bugs.python.org/issue25155 ), but should be large enough to prevent annoyance.   diff --git a/python/lsst/pipe/base/cmdLineTask.py b/python/lsst/pipe/base/cmdLineTask.py index cfc4d26..34d6174 100644 --- a/python/lsst/pipe/base/cmdLineTask.py +++ b/python/lsst/pipe/base/cmdLineTask.py @@ -140,7 +140,7 @@ class TaskRunner(object): [1] http://bugs.python.org/issue8296 [2] http://stackoverflow.com/questions/1408356/keyboard-interrupts-with-pythons-multiprocessing-pool) """ - TIMEOUT = 999999999999 # Default timeout (sec) for multiprocessing + TIMEOUT = 3600*24*30 # Default timeout (sec) for multiprocessing def __init__(self, TaskClass, parsedCmd, doReturnResults=False): """!Construct a TaskRunner   commit b33ffe24b6716cee286993c3493f7da27d464296 Author: Paul Price <price@astro.princeton.edu> Date: Fri Jun 2 14:01:42 2017 -0400   TaskRunner: fix doc We don't use TIMEOUT_DEFAULT, but TIMEOUT (there is no TIMEOUT_DEFAULT).   diff --git a/python/lsst/pipe/base/cmdLineTask.py b/python/lsst/pipe/base/cmdLineTask.py index 34d6174..d5f93d1 100644 --- a/python/lsst/pipe/base/cmdLineTask.py +++ b/python/lsst/pipe/base/cmdLineTask.py @@ -135,7 +135,7 @@ class TaskRunner(object): Due to a python bug [1], handling a KeyboardInterrupt properly requires specifying a timeout [2]. This timeout (in sec) can be specified as the "timeout" element in the output from ArgumentParser (the "parsedCmd"), if - available, otherwise we use TaskRunner.TIMEOUT_DEFAULT. + available, otherwise we use TaskRunner.TIMEOUT. [1] http://bugs.python.org/issue8296 [2] http://stackoverflow.com/questions/1408356/keyboard-interrupts-with-pythons-multiprocessing-pool)
            Hide
            rowen Russell Owen added a comment -

            (The pull request looks weird because this is not rebased to master, so I used Source Tree and my comments are below.)

            Your changes look fine. I confess I was uneasy about the huge number you used initially, but I figured if it passed Jenkins then I had no valid grounds for complaint. So I much prefer your new "long enough but not crazy" value. Thank you!

            One minor suggest which you can take or leave: rename TIMEOUT to DEFAULT_TIMEOUT or DEFAULT_TIMEOUT_SEC.

            Show
            rowen Russell Owen added a comment - (The pull request looks weird because this is not rebased to master, so I used Source Tree and my comments are below.) Your changes look fine. I confess I was uneasy about the huge number you used initially, but I figured if it passed Jenkins then I had no valid grounds for complaint. So I much prefer your new "long enough but not crazy" value. Thank you! One minor suggest which you can take or leave: rename TIMEOUT to DEFAULT_TIMEOUT or DEFAULT_TIMEOUT_SEC.
            Hide
            price Paul Price added a comment -

            Yes, I should have listened to you the first time. But it passed Jenkins.... I blame the python bug that it triggers!

            Having just fixed up the docs for TIMEOUT, I think I'd prefer to leave the name (and I don't want to have to go searching for uses or RFC it).

            Merged to master.

            Show
            price Paul Price added a comment - Yes, I should have listened to you the first time. But it passed Jenkins.... I blame the python bug that it triggers! Having just fixed up the docs for TIMEOUT , I think I'd prefer to leave the name (and I don't want to have to go searching for uses or RFC it). Merged to master.

              People

              Assignee:
              price Paul Price
              Reporter:
              price Paul Price
              Reviewers:
              Russell Owen
              Watchers:
              Paul Price, Russell Owen
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.