Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-10834

CmdLineTask -j multiprocessing hangs with long data ID lists

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: pipe_base
    • Labels:
      None

      Description

      processCcd.py seems to reliably hang when given long data ID lists (possibly only on Python 2, but on at least Ubuntu Linux and OSX). To reproduce:

      • Run scons on ci_hsc to create the raw data repository (can kill it after it starts running ProcessCcdTask).
      • Run

        processCcd.py DATA -j2 --rerun mpbug --id visit=903334..903338:2 --id visit=903342..903346:2   --id visit=903986..903990:2 --id visit=904010^904014
        

        Attachments

          Issue Links

            Activity

            Hide
            price Paul Price added a comment -

            Pim Schellart [X], would you please review this?

            price@pap-laptop:~/LSST/afw (tickets/DM-10834=) $ git sub-patch
            commit 8811f041ba81d4167d46e8bfc93b458038c63e06
            Author: Paul Price <price@astro.princeton.edu>
            Date:   Mon Jun 12 17:23:39 2017 -0400
             
                pybind: clear error if we're not going to use it
                
                To fail to do so results in a deadlock when using python multiprocessing.
             
            diff --git a/python/lsst/afw/table/schema/schema.cc b/python/lsst/afw/table/schema/schema.cc
            index 4123f7f3f..a523b04fb 100644
            --- a/python/lsst/afw/table/schema/schema.cc
            +++ b/python/lsst/afw/table/schema/schema.cc
            @@ -346,6 +346,7 @@ void declareSchema(py::module &mod) {
                     try {
                         self.attr("find")(key);
                     } catch (py::error_already_set &err) {
            +            err.clear();
                         return false;
                     }
                     return true;
            

            Show
            price Paul Price added a comment - Pim Schellart [X] , would you please review this? price@pap-laptop:~/LSST/afw (tickets/DM-10834=) $ git sub-patch commit 8811f041ba81d4167d46e8bfc93b458038c63e06 Author: Paul Price <price@astro.princeton.edu> Date: Mon Jun 12 17:23:39 2017 -0400   pybind: clear error if we're not going to use it To fail to do so results in a deadlock when using python multiprocessing.   diff --git a/python/lsst/afw/table/schema/schema.cc b/python/lsst/afw/table/schema/schema.cc index 4123f7f3f..a523b04fb 100644 --- a/python/lsst/afw/table/schema/schema.cc +++ b/python/lsst/afw/table/schema/schema.cc @@ -346,6 +346,7 @@ void declareSchema(py::module &mod) { try { self.attr("find")(key); } catch (py::error_already_set &err) { + err.clear(); return false; } return true;
            Hide
            pschella Pim Schellart [X] (Inactive) added a comment -

            Looks great!
            Thank you very much for looking into this! Definitely a gold star!

            Show
            pschella Pim Schellart [X] (Inactive) added a comment - Looks great! Thank you very much for looking into this! Definitely a gold star!
            Hide
            price Paul Price added a comment -

            Thanks Pim.

            Jenkins passed for py2 and py3 (except for ci_hsc on py3, which doesn't work yet, silly me).

            Merged to master.

            Show
            price Paul Price added a comment - Thanks Pim. Jenkins passed for py2 and py3 (except for ci_hsc on py3, which doesn't work yet, silly me). Merged to master.
            Hide
            jbosch Jim Bosch added a comment -

            Excellent work, Paul Price.

            I'm embarrassed I didn't see and recommend this change: I looked at this exact piece of code, complained that it was confusing that ~error_already_set restored the Python error state, and then didn't notice that restoring the Python error state here is entirely the wrong thing to do (since having a Python error set and not returning NULL is considered a serious problem by the interpreter, and moreover we don't want to re-raise this exception). Because the old version of this code is the natural C++ idiom for catching and not re-throwing an excpetion, we should look for other places where we've used it incorrectly.

            Pim Schellart [X], Paul Price, I think we should consider asking pybind11 to change the behavior of ~error_already_set, which would then require an explicit call to restore when you want to re-raise in Python instead of an explicit call to clear when you do not. Do you agree?

            Show
            jbosch Jim Bosch added a comment - Excellent work, Paul Price . I'm embarrassed I didn't see and recommend this change: I looked at this exact piece of code, complained that it was confusing that ~error_already_set restored the Python error state, and then didn't notice that restoring the Python error state here is entirely the wrong thing to do (since having a Python error set and not returning NULL is considered a serious problem by the interpreter, and moreover we don't want to re-raise this exception). Because the old version of this code is the natural C++ idiom for catching and not re-throwing an excpetion, we should look for other places where we've used it incorrectly. Pim Schellart [X] , Paul Price , I think we should consider asking pybind11 to change the behavior of ~error_already_set , which would then require an explicit call to restore when you want to re-raise in Python instead of an explicit call to clear when you do not. Do you agree?
            Hide
            pschella Pim Schellart [X] (Inactive) added a comment -

            Jim Bosch, yes I definitely agree.

            Show
            pschella Pim Schellart [X] (Inactive) added a comment - Jim Bosch , yes I definitely agree.

              People

              Assignee:
              jbosch Jim Bosch
              Reporter:
              jbosch Jim Bosch
              Reviewers:
              Pim Schellart [X] (Inactive)
              Watchers:
              Jim Bosch, John Swinbank, Jonathan Sick, Merlin Fisher-Levine, Paul Price, Pim Schellart [X] (Inactive), Robert Lupton
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.