Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-15444

Investigate mysterious GIL management in pybind11 wrappers

    XMLWordPrintable

    Details

    • Type: Story
    • Status: To Do
    • Resolution: Unresolved
    • Fix Version/s: None
    • Component/s: None

      Description

      We've recently seen a few tickets (DM-15419, DM-15406) in which it seems we have to release or avoid acquiring the Python Global Intepreter Lock to avoid a crash or deadlock, when only one thread should be in play.  We've fixed these without really understanding why the fix works, which is slightly working.

      This ticket is primarily intended to record and organize the places in the code base where this kind of workaround/fix has been necessary, so we can easily update them if we discover a better way to approach the problem.  As it's possible these problems are due to one or more pybind11 bugs (but it's hard for us to determine that), it's not a high priority for us to investigate ourselves, I think, but I think it's pretty likely we'll want to keep careful records so we can respond appropriately to and improvements coming from pybind11 upstream.

        Attachments

          Issue Links

            Activity

            Hide
            jbosch Jim Bosch added a comment -

            Of the problem reports gathered here,

            • DM-15406 involves a now-removed package (meas_mosaic), so it's going to be nearly impossible to reproduce. I still don't understand why the fix worked, but I don't think it's worth anyone's time to revisit it without a how-to-reproduce.
            • DM-15419 does look easy to reproduce - it's was unit test failure in utils (now cpputils), and it probably would be worth trying to revert the not-obviously-our-problem fix from that ticket to see if modern pybind11 now takes care of it for us.
            • DM-15478 looks like it ought to still be reproducible, too, but the chances that pybind11 has fully fixed the problem identified there seem small, since the issue Tim Jenness created has gone completely unnoticed. And the more serious part of the problem seemed to have been already fixed on pybind11 master as of the resolution of that ticket.

            So, I think this ticket is worth keeping around for the DM-15419 followup, but the rest probably isn't worth anyone's time.

            Show
            jbosch Jim Bosch added a comment - Of the problem reports gathered here, DM-15406 involves a now-removed package (meas_mosaic), so it's going to be nearly impossible to reproduce. I still don't understand why the fix worked, but I don't think it's worth anyone's time to revisit it without a how-to-reproduce. DM-15419 does look easy to reproduce - it's was unit test failure in utils (now cpputils), and it probably would be worth trying to revert the not-obviously-our-problem fix from that ticket to see if modern pybind11 now takes care of it for us. DM-15478 looks like it ought to still be reproducible, too, but the chances that pybind11 has fully fixed the problem identified there seem small, since the issue Tim Jenness created has gone completely unnoticed. And the more serious part of the problem seemed to have been already fixed on pybind11 master as of the resolution of that ticket. So, I think this ticket is worth keeping around for the DM-15419 followup, but the rest probably isn't worth anyone's time.

              People

              Assignee:
              Unassigned Unassigned
              Reporter:
              jbosch Jim Bosch
              Watchers:
              Jim Bosch, Paul Price, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:

                  Jenkins

                  No builds found.