Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-5471

Wrap example C++ code with Cython

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Story Points:
      10
    • Sprint:
      DRP X16-1, DRP X16-2
    • Team:
      Data Release Production

      Description

      Take the example C++ codebase developed in DM-5470, and expose it to Python in the most idiomatic possible way using Cython. Produce a technical note describing how this was carried out and discussing any particular pain points either in implementation or results.

        Attachments

          Issue Links

            Activity

            Hide
            pschella Pim Schellart [X] (Inactive) added a comment -

            A pull request for the code in the cython branch of my python-cpp-challenge fork has been sent to Jim.
            A technote is available for review at https://github.com/lsst-dm/dmtn-013. Not sure how to assign it to anyone since it isn't a fork.

            Show
            pschella Pim Schellart [X] (Inactive) added a comment - A pull request for the code in the cython branch of my python-cpp-challenge fork has been sent to Jim. A technote is available for review at https://github.com/lsst-dm/dmtn-013 . Not sure how to assign it to anyone since it isn't a fork.
            Hide
            jbosch Jim Bosch added a comment -

            So far I've just gone through the tech note, not the code, and I'll try to avoid comments on how good/bad Cython seems to be for this problem (I want to save that for a more public discussion after the tech note is complete). The other things I noticed:

            • I'd like to see some description of the cdef class bindings that actually get produced by Cython. Does it produce a single Python type defined in C, or does it generate Python code as well? Pasting in some generated code for a method or property might be interesting.
            • I've very disturbed that you had to call Py_Initialize; that's almost certainly a problem, because normally that means embedding a new Python interpreter inside an existing C++ function. Could you get the same behavior by using the Python C API import machinery to import the Cython module at that point, instead of calling the init function directly?
            • I'm no expert on OS X linking, but I think this ought to be possible by just building a bundle, if you use sys.setdlopenflags, and don't try to link the the Swig module against the Cython one. You can see how I did this in the __init__.py file of the python2-c-api challenge solution.

            Short answer on my opinion of Cython from all of this is that I'm very unimpressed, and would have to say it honestly looks worse than Swig, even if we were to put the effort into making things Pythonic directly within Swig. We'll see if looking at the code changes my mind.

            Show
            jbosch Jim Bosch added a comment - So far I've just gone through the tech note, not the code, and I'll try to avoid comments on how good/bad Cython seems to be for this problem (I want to save that for a more public discussion after the tech note is complete). The other things I noticed: I'd like to see some description of the cdef class bindings that actually get produced by Cython. Does it produce a single Python type defined in C, or does it generate Python code as well? Pasting in some generated code for a method or property might be interesting. I've very disturbed that you had to call Py_Initialize ; that's almost certainly a problem, because normally that means embedding a new Python interpreter inside an existing C++ function. Could you get the same behavior by using the Python C API import machinery to import the Cython module at that point, instead of calling the init function directly? I'm no expert on OS X linking, but I think this ought to be possible by just building a bundle, if you use sys.setdlopenflags , and don't try to link the the Swig module against the Cython one. You can see how I did this in the __init__.py file of the python2-c-api challenge solution. Short answer on my opinion of Cython from all of this is that I'm very unimpressed, and would have to say it honestly looks worse than Swig, even if we were to put the effort into making things Pythonic directly within Swig. We'll see if looking at the code changes my mind.
            Hide
            pschella Pim Schellart [X] (Inactive) added a comment - - edited

            Thank you for the review. I agree we should defer discussions on the relevant merits of different wrapping systems to a later time. Concerning the specific issues raised, I have addressed all three and have pushed the relevant updates to code and documentation. I agree that the two technical issues are now better addressed, but I am not convinced that adding the generated code example to the technotes has clarified things (since the generated code is so verbose and mangled).

            Note that we are not embedding a Python interpreter in the C++ functions and this was mainly to just ensure that the module is initialised (which is now done better through PyImport_ImportModule).

            Overall, I am also not (yet) convinced about Cython for our use case. But that said, I am not convinced Swig is any better either.

            Show
            pschella Pim Schellart [X] (Inactive) added a comment - - edited Thank you for the review. I agree we should defer discussions on the relevant merits of different wrapping systems to a later time. Concerning the specific issues raised, I have addressed all three and have pushed the relevant updates to code and documentation. I agree that the two technical issues are now better addressed, but I am not convinced that adding the generated code example to the technotes has clarified things (since the generated code is so verbose and mangled). Note that we are not embedding a Python interpreter in the C++ functions and this was mainly to just ensure that the module is initialised (which is now done better through PyImport_ImportModule ). Overall, I am also not (yet) convinced about Cython for our use case. But that said, I am not convinced Swig is any better either.
            Hide
            swinbank John Swinbank added a comment -

            Jim will have a lot more insight than I do into what constitutes a good response to his challenge. However, here are a few fairly superficial thoughts.

            I found the way that identity of objects is managed to be fairly unidiomatic from a Python point of view. For example, adding a Doodad to a DoodadSet & then repeatedly calling as_list (or just list) on it returns a new Doodad object each time. Since they share the same underlying state that's not really a problem, but it doesn't feel like Python. (I'd imagine it could be a problem if I could set arbitrary attributes on the Doodad at Python level, but that's forbidden.) For example:

            >>> a = object()
            >>> l = []
            >>> l.append(a)
            >>> a is l.pop()
            True
            

            vs

            >>> d = challenge.basics.Doodad("foo", 1)
            >>> dds = challenge.containers.DoodadSet()
            >>> dds.add(d)
            >>> a is dds.as_list().pop()
            False
            

            On the subject of making things idiomatic, it would be nice to do better than:

            >>> challenge.basics.Doodad(1, "foo")
            TypeError: __init__() takes at least one string argument
            

            Out of interest, why do we need to set macosx-version-min=10.7?

            If the code were being merged to the stack, we'd be a bit pickier about coding standards (primarily, we don't like unexplained commented code). I don't think it's necessary to worry about that here though. I do wonder if a few more comments in the source would be helpful for future readers (especially since there may be several of them if this comparison goes out to a wider audience) but that's not strictly required given the existence of the tech note.

            There are a few typos in the reST in the technote, which unfortunately have quite big impacts on the output – they cause a few code blocks not to be displayed properly and significant information is lost. Try the following:

            --- a/index.rst
            +++ b/index.rst
            @@ -231,6 +231,7 @@ The C++ class ``Doodad`` also has a method called ``clone()`` that returns a ``u
             which also requires ``std::move`` to be declared in the ``.pxd`` file.
             
             .. code-block:: cython
            +
                 cdef extern from "<utility>" namespace "std" nogil:
                     cdef shared_ptr[Doodad] move(unique_ptr[Doodad])
                     cdef shared_ptr[Doodad] move(shared_ptr[Doodad])
            @@ -238,6 +239,7 @@ which also requires ``std::move`` to be declared in the ``.pxd`` file.
             There are a few things annoying about this. One is that a separate specialization is to be declared for every type of move and the other is that Cython doesn't like it if two specializations have the same arguments but different return types. The following is thus not allowed.
             
             .. code-block:: cython
            +
                 cdef extern from "<utility>" namespace "std" nogil:
                     cdef unique_ptr[Doodad] move(unique_ptr[Doodad])
                     cdef shared_ptr[Doodad] move(unique_ptr[Doodad]) # error!
            @@ -411,6 +413,7 @@ This is just to show how it can be done.
             Of course ``begin()`` and ``end()`` have to be declared as well.
             
             .. code-block:: cython
            +
                 cdef extern from "containers.hpp" namespace "containers":
                     cdef cppclass DoodadSet:
                         vector[shared_ptr[Doodad]].const_iterator begin() const
            @@ -438,10 +441,10 @@ These then use typemaps declared in ``basics_typemaps.i`` such as.
                 }
             
                 %typemap(in) std::shared_ptr<basics::Doodad> {
            -    if (!sptrFromDoodad($input, &$1)) {
            -        return nullptr;
            +        if (!sptrFromDoodad($input, &$1)) {
            +            return nullptr;
            +        }
                 }
            -}
             
             The key to Cython / SWIG interoperability is of course in the functions ``newDoodadFromSptr`` (that should take a ``shared_ptr<Doodad>`` and return a Python object) and ``sptrFromDoodad`` (which does the opposite).
            

            Finally, I realise that you just followed Jim's example by forking his repository to your personal account, but before we sign off on this issue I think we should make sure it's hosted in the lsst-dm organization on GitHub. I can create a repository there for you if you don't have permissions (which you probably don't) – assuming you don't object, give me a shout when you'd like me to do so.

            Show
            swinbank John Swinbank added a comment - Jim will have a lot more insight than I do into what constitutes a good response to his challenge. However, here are a few fairly superficial thoughts. I found the way that identity of objects is managed to be fairly unidiomatic from a Python point of view. For example, adding a Doodad to a DoodadSet & then repeatedly calling as_list (or just list ) on it returns a new Doodad object each time. Since they share the same underlying state that's not really a problem, but it doesn't feel like Python. (I'd imagine it could be a problem if I could set arbitrary attributes on the Doodad at Python level, but that's forbidden.) For example: >>> a = object () >>> l = [] >>> l.append(a) >>> a is l.pop() True vs >>> d = challenge.basics.Doodad( "foo" , 1 ) >>> dds = challenge.containers.DoodadSet() >>> dds.add(d) >>> a is dds.as_list().pop() False On the subject of making things idiomatic, it would be nice to do better than: >>> challenge.basics.Doodad( 1 , "foo" ) TypeError: __init__() takes at least one string argument Out of interest, why do we need to set macosx-version-min=10.7 ? If the code were being merged to the stack, we'd be a bit pickier about coding standards (primarily, we don't like unexplained commented code ). I don't think it's necessary to worry about that here though. I do wonder if a few more comments in the source would be helpful for future readers (especially since there may be several of them if this comparison goes out to a wider audience) but that's not strictly required given the existence of the tech note. There are a few typos in the reST in the technote, which unfortunately have quite big impacts on the output – they cause a few code blocks not to be displayed properly and significant information is lost. Try the following: --- a/index.rst +++ b/index.rst @@ -231,6 +231,7 @@ The C++ class ``Doodad`` also has a method called ``clone()`` that returns a ``u which also requires ``std::move`` to be declared in the ``.pxd`` file.   .. code-block:: cython + cdef extern from "<utility>" namespace "std" nogil: cdef shared_ptr[Doodad] move(unique_ptr[Doodad]) cdef shared_ptr[Doodad] move(shared_ptr[Doodad]) @@ -238,6 +239,7 @@ which also requires ``std::move`` to be declared in the ``.pxd`` file. There are a few things annoying about this. One is that a separate specialization is to be declared for every type of move and the other is that Cython doesn't like it if two specializations have the same arguments but different return types. The following is thus not allowed.   .. code-block:: cython + cdef extern from "<utility>" namespace "std" nogil: cdef unique_ptr[Doodad] move(unique_ptr[Doodad]) cdef shared_ptr[Doodad] move(unique_ptr[Doodad]) # error! @@ -411,6 +413,7 @@ This is just to show how it can be done. Of course ``begin()`` and ``end()`` have to be declared as well.   .. code-block:: cython + cdef extern from "containers.hpp" namespace "containers": cdef cppclass DoodadSet: vector[shared_ptr[Doodad]].const_iterator begin() const @@ -438,10 +441,10 @@ These then use typemaps declared in ``basics_typemaps.i`` such as. }   %typemap(in) std::shared_ptr<basics::Doodad> { - if (!sptrFromDoodad($input, &$1)) { - return nullptr; + if (!sptrFromDoodad($input, &$1)) { + return nullptr; + } } -}   The key to Cython / SWIG interoperability is of course in the functions ``newDoodadFromSptr`` (that should take a ``shared_ptr<Doodad>`` and return a Python object) and ``sptrFromDoodad`` (which does the opposite). Finally, I realise that you just followed Jim's example by forking his repository to your personal account, but before we sign off on this issue I think we should make sure it's hosted in the lsst-dm organization on GitHub. I can create a repository there for you if you don't have permissions (which you probably don't) – assuming you don't object, give me a shout when you'd like me to do so.
            Hide
            pschella Pim Schellart [X] (Inactive) added a comment -

            Thanks for the review!

            I have:

            • fixed the reST syntax;
            • added basic documentation to the code;
            • made the constructor exception message more Pythonic.

            I don't think I can fix the non-Pythonic nature of .as_list() or .as_dict because that would require storing a pointer to the Python objects in the DoodadSet as well, which kind of defeats the purpose of wrapping the container. But I might be wrong here.

            The macosx-version-min=10.7 is needed because on OSX the Python that comes with miniconda (installed by lsstsw) is built against target version 10.5 which doesn't work with C+11 in libc+.
            I added a comment explaining this to setup.py.

            Moving the GitHub repo to llst-dm would be fine with me if Jim gives the ok.

            Show
            pschella Pim Schellart [X] (Inactive) added a comment - Thanks for the review! I have: fixed the reST syntax; added basic documentation to the code; made the constructor exception message more Pythonic. I don't think I can fix the non-Pythonic nature of .as_list() or .as_dict because that would require storing a pointer to the Python objects in the DoodadSet as well, which kind of defeats the purpose of wrapping the container. But I might be wrong here. The macosx-version-min=10.7 is needed because on OSX the Python that comes with miniconda (installed by lsstsw ) is built against target version 10.5 which doesn't work with C+ 11 in libc +. I added a comment explaining this to setup.py . Moving the GitHub repo to llst-dm would be fine with me if Jim gives the ok.
            Hide
            jbosch Jim Bosch added a comment -

            Pim Schellart [X], could you please rebase your branch on top of my repo's master? I'm mostly seeing changes that I've made on master on the PR page.

            John Swinbank, mind if we get the PRs for this and pybind11 merged, and then move this to lsst-dm?

            Show
            jbosch Jim Bosch added a comment - Pim Schellart [X] , could you please rebase your branch on top of my repo's master? I'm mostly seeing changes that I've made on master on the PR page. John Swinbank , mind if we get the PRs for this and pybind11 merged, and then move this to lsst-dm?
            Hide
            pschella Pim Schellart [X] (Inactive) added a comment -

            Rebase done.

            Show
            pschella Pim Schellart [X] (Inactive) added a comment - Rebase done.
            Hide
            swinbank John Swinbank added a comment -

            John Swinbank, mind if we get the PRs for this and pybind11 merged, and then move this to lsst-dm?

            Fine with me.

            Show
            swinbank John Swinbank added a comment - John Swinbank, mind if we get the PRs for this and pybind11 merged, and then move this to lsst-dm? Fine with me.
            Hide
            jbosch Jim Bosch added a comment - - edited

            Pim Schellart [X], at https://github.com/TallJimbo/python-cpp-challenge/pull/1/commits I'm still seeing three commits from me in the PR. Can you figure out what went wrong?

            Show
            jbosch Jim Bosch added a comment - - edited Pim Schellart [X] , at https://github.com/TallJimbo/python-cpp-challenge/pull/1/commits I'm still seeing three commits from me in the PR. Can you figure out what went wrong?
            Hide
            pschella Pim Schellart [X] (Inactive) added a comment -

            Ok, cleanup complete.

            Show
            pschella Pim Schellart [X] (Inactive) added a comment - Ok, cleanup complete.
            Hide
            jbosch Jim Bosch added a comment -

            Review complete. I've made a few comments on the Cython python-cpp-challenge PR, as well as on some commits for the tech report (since I couldn't find a better way to make line comments on that).

            There are some minor grammatical issues, but I don't think it's worth either of our time for me to go through and point them all out to you, and they don't get in the way of understanding the document at all. If you don't mind, I'll just edit the tech report source myself to make those changes?

            The big comment I have is that I'd like to actually have some more "typical" wrapper code in the tech report at some level, with some annotation and a copy of the C++ code that it wraps - or at least enough that a reader can see what the Cython bindings for a typical method looks like.

            Show
            jbosch Jim Bosch added a comment - Review complete. I've made a few comments on the Cython python-cpp-challenge PR, as well as on some commits for the tech report (since I couldn't find a better way to make line comments on that). There are some minor grammatical issues, but I don't think it's worth either of our time for me to go through and point them all out to you, and they don't get in the way of understanding the document at all. If you don't mind, I'll just edit the tech report source myself to make those changes? The big comment I have is that I'd like to actually have some more "typical" wrapper code in the tech report at some level, with some annotation and a copy of the C++ code that it wraps - or at least enough that a reader can see what the Cython bindings for a typical method looks like.
            Hide
            jbosch Jim Bosch added a comment - - edited

            Also,

            I don't think I can fix the non-Pythonic nature of .as_list() or .as_dict because that would require storing a pointer to the Python objects in the DoodadSet as well, which kind of defeats the purpose of wrapping the container. But I might be wrong here.

            I think this is basically right. The closest you can get to the behavior you'd want in Python is to construct the C++ shared_ptr by putting a PyObject in the shared_ptr's deleter instead of putting a shared_ptr in the PyObject (this is something Boost.Python does, but only under certain conditions). But that only works when the container is a container of shared_ptr, and it has the disadvantage that you have to lock the GIL when a shared_ptr goes out of scope in order to be threadsafe (something Boost.Python notoriously does not do).

            Show
            jbosch Jim Bosch added a comment - - edited Also, I don't think I can fix the non-Pythonic nature of .as_list() or .as_dict because that would require storing a pointer to the Python objects in the DoodadSet as well, which kind of defeats the purpose of wrapping the container. But I might be wrong here. I think this is basically right. The closest you can get to the behavior you'd want in Python is to construct the C++ shared_ptr by putting a PyObject in the shared_ptr 's deleter instead of putting a shared_ptr in the PyObject (this is something Boost.Python does, but only under certain conditions). But that only works when the container is a container of shared_ptr , and it has the disadvantage that you have to lock the GIL when a shared_ptr goes out of scope in order to be threadsafe (something Boost.Python notoriously does not do).
            Hide
            pschella Pim Schellart [X] (Inactive) added a comment -

            I have implemented the requested fixes to the code and documentation and it is ready for a final review.
            Note that I haven't yet described the split into MutableDoodad + ImmutableDoodad in the documentation.
            I will wait with that until you decide if this is a better solution.
            If not then I will revert that commit.
            Please feel free to fix the spelling and grammar in the technote.

            Show
            pschella Pim Schellart [X] (Inactive) added a comment - I have implemented the requested fixes to the code and documentation and it is ready for a final review. Note that I haven't yet described the split into MutableDoodad + ImmutableDoodad in the documentation. I will wait with that until you decide if this is a better solution. If not then I will revert that commit. Please feel free to fix the spelling and grammar in the technote.
            Hide
            pschella Pim Schellart [X] (Inactive) added a comment -

            Two minor actions remain on this ticket:

            • grammar fixes in technote and;
            • merge of code branch on private github followed by move to LSST repository.

            Both of these will be done by Jim, so I'm closing this ticket now.

            Show
            pschella Pim Schellart [X] (Inactive) added a comment - Two minor actions remain on this ticket: grammar fixes in technote and; merge of code branch on private github followed by move to LSST repository. Both of these will be done by Jim, so I'm closing this ticket now.

              People

              Assignee:
              pschella Pim Schellart [X] (Inactive)
              Reporter:
              swinbank John Swinbank
              Reviewers:
              Jim Bosch
              Watchers:
              Jim Bosch, John Swinbank, Pim Schellart [X] (Inactive), Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.