Details
-
Type:
Story
-
Status: Done
-
Resolution: Done
-
Fix Version/s: None
-
Component/s: afw
-
Labels:
-
Story Points:1
-
Team:Architecture
Description
If astropy.units fails to import and a raises an exception (eg if warnings are converted to errors) when being imported via Schema.cc during import of lsst.afw.image an abort signal is received:
terminating with uncaught exception of type pybind11::error_already_set: SystemError: <built-in method __contains__ of dict object at 0x10049b900> returned a result with an error set
|
Stepping through schema.cc the line triggering this is:
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 2.1
|
frame #0: 0x0000000116e3d53d schema.so`void lsst::afw::table::(anonymous namespace)::declareSchemaType<unsigned char>(mod=0x00007ffeefbf33c0) at schema.cc:187 [opt]
|
184 std::string suffix = FieldBase<T>::getTypeString();
|
185 py::str pySuffix(suffix);
|
186
|
-> 187 py::object astropyUnit = py::module::import("astropy.units").attr("Unit");
|
188
|
189 // FieldBase
|
190 PyFieldBase<T> clsFieldBase(mod, ("FieldBase" + suffix).c_str());
|
|
* frame #0: 0x0000000116e3d53d schema.so`void lsst::afw::table::(anonymous namespace)::declareSchemaType<unsigned char>(mod=0x00007ffeefbf33c0) at schema.cc:187 [opt]
|
frame #1: 0x0000000116e3d143 schema.so`lsst::afw::table::(anonymous namespace)::pybind11_init_schema(mod=0x00007ffeefbf33c0) at schema.cc:398 [opt]
|
frame #2: 0x0000000116e3cf5d schema.so`::PyInit_schema() at schema.cc:390 [opt]
|
|
Something in the pybind11 layer is failing to catch the python exception and forward it on.
Depending on your system this can be reproduced with:
import warnings |
warnings.simplefilter("error") |
import lsst.afw.image |
Given the reference to schema.cc and _contains_ I initially thought this was related to DM-15406 and there is some commentary there.
Attachments
Issue Links
Activity
Also, apCorrMap.cc:44 is the code importing lsst.afw.table.io which is eventually importing schema and then astropy.units.
Funnily enough a Jim Bosch filed exactly this bug on pybind11: https://github.com/pybind/pybind11/issues/686 and this was allegedly fixed over a year ago...
Paul Price suggested adding e.restore() before calling PyErr_SetString() in PYBIND11_MODULE. To make investigating this a bit quicker I worked on daf_base and added an import of "astropy.units" into the C++ code for PropertyList. Without the patch I got the normal SIGABRT. With the patch I got:
>>> import warnings
|
>>> warnings.simplefilter("error")
|
>>> import lsst.daf.base
|
Traceback (most recent call last):
|
File "<stdin>", line 1, in <module>
|
File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/build/daf_base/python/lsst/daf/base/__init__.py", line 27, in <module>
|
from .propertyContainer import *
|
File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/build/daf_base/python/lsst/daf/base/propertyContainer/__init__.py", line 25, in <module>
|
from .propertyList import *
|
ImportError: ImportWarning: Not importing directory /Volumes/ExternalSSD/Users/timj/work/lsstsw/miniconda/envs/lsst-scipipe/lib/python3.6/site-packages/mpl_toolkits: missing __init__
|
|
At:
|
<frozen importlib._bootstrap_external>(426): _find_module_shim
|
/Volumes/ExternalSSD/Users/timj/work/lsstsw/miniconda/envs/lsst-scipipe/lib/python3.6/site-packages/pkg_resources/__init__.py(2090): _handle_ns
|
/Volumes/ExternalSSD/Users/timj/work/lsstsw/miniconda/envs/lsst-scipipe/lib/python3.6/site-packages/pkg_resources/__init__.py(2173): declare_namespace
|
/Volumes/ExternalSSD/Users/timj/work/lsstsw/miniconda/envs/lsst-scipipe/lib/python3.6/site-packages/pkg_resources/__init__.py(2633): activate
|
/Volumes/ExternalSSD/Users/timj/work/lsstsw/miniconda/envs/lsst-scipipe/lib/python3.6/site-packages/pkg_resources/__init__.py(3123): <genexpr>
|
for comparison, this is the output from loading astropy.units itself:
>>> import warnings
|
>>> warnings.simplefilter("error")
|
>>> import astropy.units
|
Traceback (most recent call last):
|
File "<stdin>", line 1, in <module>
|
File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/miniconda/envs/lsst-scipipe/lib/python3.6/site-packages/astropy/__init__.py", line 118, in <module>
|
_check_numpy()
|
File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/miniconda/envs/lsst-scipipe/lib/python3.6/site-packages/astropy/__init__.py", line 107, in _check_numpy
|
requirement_met = minversion(numpy, __minimum_numpy_version__)
|
File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/miniconda/envs/lsst-scipipe/lib/python3.6/site-packages/astropy/utils/introspection.py", line 147, in minversion
|
from pkg_resources import parse_version
|
File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/miniconda/envs/lsst-scipipe/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3095, in <module>
|
@_call_aside
|
File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/miniconda/envs/lsst-scipipe/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3079, in _call_aside
|
f(*args, **kwargs)
|
File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/miniconda/envs/lsst-scipipe/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3123, in _initialize_master_working_set
|
for dist in working_set
|
File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/miniconda/envs/lsst-scipipe/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3123, in <genexpr>
|
for dist in working_set
|
File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/miniconda/envs/lsst-scipipe/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2633, in activate
|
declare_namespace(pkg)
|
File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/miniconda/envs/lsst-scipipe/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2170, in declare_namespace
|
_handle_ns(packageName, path_item)
|
File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/miniconda/envs/lsst-scipipe/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2090, in _handle_ns
|
loader = importer.find_module(packageName)
|
File "<frozen importlib._bootstrap_external>", line 426, in _find_module_shim
|
ImportWarning: Not importing directory /Volumes/ExternalSSD/Users/timj/work/lsstsw/miniconda/envs/lsst-scipipe/lib/python3.6/site-packages/mpl_toolkits: missing __init__
|
The native python trace is more informative and the pybind11 trace seems to be reversed. I will do a build of afw and see what happens with the new pybind11.
I've built afw and confirm that it works, although it's not correct in that you get:
ImportError: ImportError: ImportWarning: Not importing directory /Volumes/ExternalSSD/Users/timj/work/lsstsw/miniconda/envs/lsst-scipipe/lib/python3.6/site-packages/mpl_toolkits: missing __init__
|
so the import of astropy is generating a fatal ImportWarning, then pybind11 is catching that for the import of schema and adding its own ImportError to the string, and then apCorrMap is catching that and adding another ImportError.
I would argue that in the case of a python exception being passed through C++ pybind11 should not be changing the exception type at all. How is somebody meant to be able to know that importing something directly gives a ImportWarning but importing another package that imports this package now suddenly calls it ImportError.
As an experiment I removed the PyErr_SetString() completely from the already set clause and just left the e.restore() and then everything worked perfectly and I got the traceback I expected from python without truncation:
>>> import warnings
|
>>> warnings.simplefilter("error")
|
>>> import lsst.daf.base
|
Traceback (most recent call last):
|
File "<stdin>", line 1, in <module>
|
File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/build/daf_base/python/lsst/daf/base/__init__.py", line 27, in <module>
|
from .propertyContainer import *
|
File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/build/daf_base/python/lsst/daf/base/propertyContainer/__init__.py", line 25, in <module>
|
from .propertyList import *
|
File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/miniconda/envs/lsst-scipipe/lib/python3.6/site-packages/astropy/__init__.py", line 118, in <module>
|
_check_numpy()
|
File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/miniconda/envs/lsst-scipipe/lib/python3.6/site-packages/astropy/__init__.py", line 107, in _check_numpy
|
requirement_met = minversion(numpy, __minimum_numpy_version__)
|
File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/miniconda/envs/lsst-scipipe/lib/python3.6/site-packages/astropy/utils/introspection.py", line 147, in minversion
|
from pkg_resources import parse_version
|
File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/miniconda/envs/lsst-scipipe/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3095, in <module>
|
@_call_aside
|
File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/miniconda/envs/lsst-scipipe/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3079, in _call_aside
|
f(*args, **kwargs)
|
File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/miniconda/envs/lsst-scipipe/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3123, in _initialize_master_working_set
|
for dist in working_set
|
File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/miniconda/envs/lsst-scipipe/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3123, in <genexpr>
|
for dist in working_set
|
File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/miniconda/envs/lsst-scipipe/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2633, in activate
|
declare_namespace(pkg)
|
File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/miniconda/envs/lsst-scipipe/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2170, in declare_namespace
|
_handle_ns(packageName, path_item)
|
File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/miniconda/envs/lsst-scipipe/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2090, in _handle_ns
|
loader = importer.find_module(packageName)
|
File "<frozen importlib._bootstrap_external>", line 426, in _find_module_shim
|
ImportWarning: Not importing directory /Volumes/ExternalSSD/Users/timj/work/lsstsw/miniconda/envs/lsst-scipipe/lib/python3.6/site-packages/mpl_toolkits: missing __init__
|
>>>
|
This correctly shows that lsst.daf.base is trying to import astropy (in my hacked up daf_base) and we end up with a correct exception type of ImportWarning. pybind11 is making things worse by trying to force the exception type.
This is the output from importing lsst.afw.image with the e.restore() added but nothing removed:
$ python -Wd xxx
|
/Volumes/ExternalSSD/Users/timj/work/lsstsw/stack/DarwinX86/python_psutil/5.4.3+2/lib/python/site.py:26: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
|
import imp # Avoid import loop in Python >= 3.3
|
Traceback (most recent call last):
|
File "xxx", line 3, in <module>
|
import lsst.afw.image
|
File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/stack/DarwinX86/afw/16.0-26-gf825f1c6c+2/python/lsst/afw/image/__init__.py", line 25, in <module>
|
from .apCorrMap import *
|
File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/stack/DarwinX86/afw/16.0-26-gf825f1c6c+2/python/lsst/afw/image/apCorrMap/__init__.py", line 23, in <module>
|
from .apCorrMap import *
|
ImportError: ImportError: ImportWarning: Not importing directory /Volumes/ExternalSSD/Users/timj/work/lsstsw/miniconda/envs/lsst-scipipe/lib/python3.6/site-packages/mpl_toolkits: missing __init__
|
|
At:
|
<frozen importlib._bootstrap_external>(426): _find_module_shim
|
/Volumes/ExternalSSD/Users/timj/work/lsstsw/miniconda/envs/lsst-scipipe/lib/python3.6/site-packages/pkg_resources/__init__.py(2090): _handle_ns
|
/Volumes/ExternalSSD/Users/timj/work/lsstsw/miniconda/envs/lsst-scipipe/lib/python3.6/site-packages/pkg_resources/__init__.py(2173): declare_namespace
|
/Volumes/ExternalSSD/Users/timj/work/lsstsw/miniconda/envs/lsst-scipipe/lib/python3.6/site-packages/pkg_resources/__init__.py(2633): activate
|
/Volumes/ExternalSSD/Users/timj/work/lsstsw/miniconda/envs/lsst-scipipe/lib/python3.6/site-packages/pkg_resources/__init__.py(3123): <genexpr>
|
|
|
At:
|
/Volumes/ExternalSSD/Users/timj/work/lsstsw/stack/DarwinX86/afw/16.0-26-gf825f1c6c+2/python/lsst/afw/table/schema/__init__.py(23): <module>
|
<frozen importlib._bootstrap>(219): _call_with_frames_removed
|
<frozen importlib._bootstrap_external>(678): exec_module
|
<frozen importlib._bootstrap>(665): _load_unlocked
|
<frozen importlib._bootstrap>(955): _find_and_load_unlocked
|
<frozen importlib._bootstrap>(971): _find_and_load
|
/Volumes/ExternalSSD/Users/timj/work/lsstsw/stack/DarwinX86/afw/16.0-26-gf825f1c6c+2/python/lsst/afw/table/__init__.py(24): <module>
|
<frozen importlib._bootstrap>(219): _call_with_frames_removed
|
<frozen importlib._bootstrap_external>(678): exec_module
|
<frozen importlib._bootstrap>(665): _load_unlocked
|
<frozen importlib._bootstrap>(955): _find_and_load_unlocked
|
<frozen importlib._bootstrap>(971): _find_and_load
|
<frozen importlib._bootstrap>(219): _call_with_frames_removed
|
<frozen importlib._bootstrap>(941): _find_and_load_unlocked
|
<frozen importlib._bootstrap>(971): _find_and_load
|
<frozen importlib._bootstrap>(219): _call_with_frames_removed
|
<frozen importlib._bootstrap_external>(922): create_module
|
<frozen importlib._bootstrap>(571): module_from_spec
|
<frozen importlib._bootstrap>(658): _load_unlocked
|
<frozen importlib._bootstrap>(955): _find_and_load_unlocked
|
<frozen importlib._bootstrap>(971): _find_and_load
|
/Volumes/ExternalSSD/Users/timj/work/lsstsw/stack/DarwinX86/afw/16.0-26-gf825f1c6c+2/python/lsst/afw/image/apCorrMap/__init__.py(23): <module>
|
<frozen importlib._bootstrap>(219): _call_with_frames_removed
|
<frozen importlib._bootstrap_external>(678): exec_module
|
<frozen importlib._bootstrap>(665): _load_unlocked
|
<frozen importlib._bootstrap>(955): _find_and_load_unlocked
|
<frozen importlib._bootstrap>(971): _find_and_load
|
/Volumes/ExternalSSD/Users/timj/work/lsstsw/stack/DarwinX86/afw/16.0-26-gf825f1c6c+2/python/lsst/afw/image/__init__.py(25): <module>
|
<frozen importlib._bootstrap>(219): _call_with_frames_removed
|
<frozen importlib._bootstrap_external>(678): exec_module
|
<frozen importlib._bootstrap>(665): _load_unlocked
|
<frozen importlib._bootstrap>(955): _find_and_load_unlocked
|
<frozen importlib._bootstrap>(971): _find_and_load
|
with the PyErr_SetString call removed and with only the e.restore() I get:
>>> import lsst.afw.image
|
Traceback (most recent call last):
|
File "<stdin>", line 1, in <module>
|
File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/build/afw/python/lsst/afw/image/__init__.py", line 25, in <module>
|
from .apCorrMap import *
|
File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/build/afw/python/lsst/afw/image/apCorrMap/__init__.py", line 23, in <module>
|
from .apCorrMap import *
|
File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/build/afw/python/lsst/afw/table/__init__.py", line 24, in <module>
|
from .schema import *
|
File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/build/afw/python/lsst/afw/table/schema/__init__.py", line 23, in <module>
|
from .schema import *
|
File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/miniconda/envs/lsst-scipipe/lib/python3.6/site-packages/astropy/__init__.py", line 118, in <module>
|
_check_numpy()
|
File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/miniconda/envs/lsst-scipipe/lib/python3.6/site-packages/astropy/__init__.py", line 107, in _check_numpy
|
requirement_met = minversion(numpy, __minimum_numpy_version__)
|
File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/miniconda/envs/lsst-scipipe/lib/python3.6/site-packages/astropy/utils/introspection.py", line 147, in minversion
|
from pkg_resources import parse_version
|
File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/miniconda/envs/lsst-scipipe/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3095, in <module>
|
@_call_aside
|
File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/miniconda/envs/lsst-scipipe/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3079, in _call_aside
|
f(*args, **kwargs)
|
File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/miniconda/envs/lsst-scipipe/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3123, in _initialize_master_working_set
|
for dist in working_set
|
File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/miniconda/envs/lsst-scipipe/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3123, in <genexpr>
|
for dist in working_set
|
File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/miniconda/envs/lsst-scipipe/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2633, in activate
|
declare_namespace(pkg)
|
File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/miniconda/envs/lsst-scipipe/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2170, in declare_namespace
|
_handle_ns(packageName, path_item)
|
File "/Volumes/ExternalSSD/Users/timj/work/lsstsw/miniconda/envs/lsst-scipipe/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2090, in _handle_ns
|
loader = importer.find_module(packageName)
|
File "<frozen importlib._bootstrap_external>", line 426, in _find_module_shim
|
ImportWarning: Not importing directory /Volumes/ExternalSSD/Users/timj/work/lsstsw/miniconda/envs/lsst-scipipe/lib/python3.6/site-packages/mpl_toolkits: missing __init__
|
which is much much better.
Jim Bosch this patch file seems to fix everything for me (mostly thanks to suggestions from Paul Price). What do you want to do? Do we know why pybind11 upstream doesn't do this? It implies to me they aren't testing failed imports at all. If failed imports are being tested then I'm confused why they don't see my problem. I definitely think it's wrong to rewrite the python import exception as an ImportError regardless of the error python raised, and the afw import shows that you end up with multiple ImportErrors stacked up if you do it the current way.
Have we asked to see whether pybind11 wants this? I would guess that they do, especially if we can convince them of this argument:
I definitely think it's wrong to rewrite the python import exception as an ImportError regardless of the error python raised, and the afw import shows that you end up with multiple ImportErrors stacked up if you do it the current way.
If that argument isn't enough, though, it may be hard to sell them via the "without e.restore() it crashes" argument - I still do not see how that could possibly relate to the uncaught exception. I'd guess it's GIL-related (the e.restore() prevents error_already_set's destructor from trying to acquire the GIL), but that's just because I already know I don't understand other GIL problems we have - I do not see a way to connect this fix to the original problem.
So the best case scenario is to get the fix upstream with the reasonable motivation that it provides better error messages. If they don't want it, we probably need to apply the patch ourselves, at least until we understand why it fixed the problem.
Surely we can agree that "SIGABRT because of an import problem" is a real problem. The next issue is the exception rewriting (clearly it's more pythonic if they don't rewrite). I'm not sure how the GIL fits into this (I haven't mentioned the GIL). I suppose I need to try the current pybind11 – it's very easy to reproduce the simple sigabrt.
Surely we can agree that "SIGABRT because of an import problem" is a real problem.
Oh, absolutely. It's just one that I think will require a minimal reproducing example before we have a hope of convincing upstream that this is the right fix rather than a lucky one (just like the other tickets associated with DM-15444).
(All of this assumes that none of us actually understand why adding e.restore() prevents a C++ exception from propagating up to terminate - if you or Paul Price does and I'm just not seeing it, please speak up.)
Good news is that pybind11 master does not SIGABRT – you get the annoyingly incomplete rewritten exception text. I'll file a ticket anyhow to see if I can stop the exception rewriting. In the mean time I can merge this for the older pybind11 we have.
Christopher Waters and I just ran into another error_already_set terminate in a new branch for DM-14980, so it'd definitely be nice to get this patch onto (LSST) master to make it easy to test if it fixes that problem as well.
Merged. Jenkins passed.
Looking at the pybind11 import code:
PyObject *obj = PyImport_ImportModule(name);
}
it seems that it always throws error_already_set and that is consistent with the error message above. Who should be catching this exception? I don't know enough about pybind11 to understand whether it's meant to be left to propagate all the way back to python where python will understand it properly (which is not what's happening).