Details
-
Type:
RFC
-
Status: Implemented
-
Resolution: Done
-
Component/s: DM
-
Labels:None
Description
Pim Schellart [X] recently spent some time exploring a few alternatives to Swig, by building wrappers for some example C++ interfaces I put together. The results for Cython and Pybind11 have been written up in technical reports (DMTN-13 and DMTN-14, respectively), and CFFI was rejected early due to the need to write pure C wrappers for every C++ interface.
I encourage everyone to read those notes and form their own opinions, but I think they make a very strong case for switching to Pybind11 and essentially reject Cython. In particular:
- Pybind11 is essentially a full rewrite of Boost.Python, but as a dependency-free header-only library. It's got all the nice support for edge cases and careful memory management that Boost.Python has, better support for the C++ standard library, and the extensibility that comes from being able to just write customization code in C++ without going through a code generator. I don't have a good sense for how widely adopted it is (the fact that it's been around less than a year puts a pretty strong upper bound on that), but the main developer is very active and responsive, and it has excellent documentation.
- Cython has a ton of market share (mostly, I think, because it's very good at adding a small amount of compiled code to Python), but its C++ support is immature and they've made some architectural decisions that make me doubt it will ever really be any good. I'd put their ceiling (for wrapping C++) as perhaps only slightly better than Swig, and it's really not anywhere close to Swig now. This is a disappointment - Cython is what Astropy uses, as well as a significant fraction of the scientific community. But I'd rather use Swig or even the raw Python C API to wrap C++ at this point.
With that in mind, I think our choices come down to switching to Pybind11 or rewriting much of our Swig to improve our dependency handling. Given that even the latter would still be a significant amount of work, and I think Pybind11 is the better choice from a (preliminary) technical standpoint, I'd like to propose that we have Pim Schellart [X] spend some fraction of his time over the next few months actually converting Science Pipelines code (from the bottom up) to use Pybind11 on a branch. The intent is that this would inform a later decision around the time of the AHM on whether to convert the rest of the stack or throw away the branch.
All I'm proposing right now is that we devote some of Pim's time to this project; I'd like to allocate enough effort that we have a reasonable shot at getting through much or all of afw, but his actual pace will tell us quite a bit about the cost of a more complete conversion.
One reason I'm attracted to Pybind11 is that we do want to spend more effort defining Pythonic interfaces - this is easier to do in Pybind11, I think, and wanting custom-crafted interfaces negates much of the automatic-interface-generation advantages of Swig. But I'm not proposing that we make any such changes while converting to Pybind11; I think it's much easier if we try to maintain the same Python interfaces whenever possible at this point, and deal with making them more Pythonic in the future.
Andy Salnikov, the good news is that I think pybind11 addresses your first two issues completely; I think the documentation for pybind11 is a vast improvement over Boost.Python, and the dependency obviously is.
The bad news is that I think that the rest of your criticisms still apply. But I personally disagree with a few of them, not because I disagree with your downsides but because they're tradeoffs where I like the upside. In particular, I much prefer a complex, RTTI/template-driven pure C++ library to a code generator, because I'd prefer to step through library logic in a debugger than be forced to guess about the code generator's logic (or build the generator from source and step through that). I also find that having a library in this role gives you essentially unlimited extensibility, while relying on a code generator can close off solutions to problems the code generator authors did not anticipate.
I think you're also right that RTTI-based type dispatch should have slightly higher overheard than custom dispatch code. I suspect this is small compared with the overall overheads involved in converting from C++ to Python, but I don't actually know that. However, I think using RTTI for the type conversion system makes it much more likely that the binding approach will scale up to a large number of modules, because it puts the responsibility for conveying that type information between dependent modules on the linker. Without RTTI, the bindings for a dependent package need to include some information from the bindings for all of its dependencies, and while I think it's possible that a bindings generator could make that information lightweight enough to not significantly impact the build time of the dependent module, I think most binding generators don't do this well. (Whether Swig does this well enough depends on your definition of "significant"; the way we were using it meant that we were including much more than we needed, and that piled up - but with an RTTI-based system there's actually nothing to pile up, so the problem we had with Swig is virtually impossible with pybind11 or Boost.Python).
Using RTTI also makes it much easier to define custom wrapper code; if I have a complex C++ template type that I want to convert to Python and then manipulate with the Python C API, it's easy to invoke the to-Python converter for that type since the compiler can look it up using RTTI. If the bindings generator doesn't use RTTI, looking up that converter is much harder (in Swig, it basically requires some fragile reverse engineering).
Finally, the problem that converter failures only happen at runtime is an important one. I think the degree to which this can happen could increase with pybind11, as it has will have less information at compile time to catch such errors (this is the flip side using the linker to transmit all type information). And some type conversion failures can of course only be caught at runtime with any bindings generator, because Python is a dynamically typed language. But I think this will practically happen less in Pybind11 than with Swig, but only because Swig has the unfortunate philosophical stance that it should be possible to pass unwrapped objects to Python as opaque objects. As a result, returning an unwrapped object in Swig leads to a failure only after the returned object is used in downstream code; with Pybind11, the error will happen still happen at runtime (because the type converter may be defined in another module), but it will happen immediately when the object is returned from C++.