Details
-
Type:
Story
-
Status: To Do
-
Resolution: Unresolved
-
Fix Version/s: None
-
Component/s: None
-
Labels:
Description
We've recently seen a few tickets (DM-15419, DM-15406) in which it seems we have to release or avoid acquiring the Python Global Intepreter Lock to avoid a crash or deadlock, when only one thread should be in play. We've fixed these without really understanding why the fix works, which is slightly working.
This ticket is primarily intended to record and organize the places in the code base where this kind of workaround/fix has been necessary, so we can easily update them if we discover a better way to approach the problem. As it's possible these problems are due to one or more pybind11 bugs (but it's hard for us to determine that), it's not a high priority for us to investigate ourselves, I think, but I think it's pretty likely we'll want to keep careful records so we can respond appropriately to and improvements coming from pybind11 upstream.
Of the problem reports gathered here,
DM-15406involves a now-removed package (meas_mosaic), so it's going to be nearly impossible to reproduce. I still don't understand why the fix worked, but I don't think it's worth anyone's time to revisit it without a how-to-reproduce.DM-15419does look easy to reproduce - it's was unit test failure in utils (now cpputils), and it probably would be worth trying to revert the not-obviously-our-problem fix from that ticket to see if modern pybind11 now takes care of it for us.DM-15478looks like it ought to still be reproducible, too, but the chances that pybind11 has fully fixed the problem identified there seem small, since the issue Tim Jenness created has gone completely unnoticed. And the more serious part of the problem seemed to have been already fixed on pybind11 master as of the resolution of that ticket.So, I think this ticket is worth keeping around for the
DM-15419followup, but the rest probably isn't worth anyone's time.