Uploaded image for project: 'Request For Comments'
  1. Request For Comments
  2. RFC-214

Use numpydoc and reStructuredText for Python docstrings

    XMLWordPrintable

    Details

    • Type: RFC
    • Status: Implemented
    • Resolution: Done
    • Component/s: DM
    • Labels:
      None

      Description

      We'll soon (~1-2 months) have a documentation system that renders HTML documentation from Python docstrings written with reStructuredText.

      At one of the PCW hack sessions (with a large majority of non-DMLT DM members in attendance) we had unanimous agreement that we should start writing Python docstrings in this form now, instead of prefixing docstrings with a "!" to invoke Doxygen markup.

      This will temporarily make HTML documentation for Python uglier (it will be rendered as reStructuredText source), but will eventually make it nicer, and it will make on-line Python docstrings modestly more readable (because reStructuredText source is more readable than Doxygen markup).

      If other hack session work goes quickly and this RFC is accepted ahead of schedule, this could give us an opportunity to start this transition for existing code during the meeting.

        Attachments

          Issue Links

            Activity

            No builds found.
            jbosch Jim Bosch created issue -
            Hide
            rhl Robert Lupton added a comment - - edited

            Do we have a plan for how we will integrate C++ and python documentation? If we'll still be using doxygen for C++, is this RFC a step backwards?

            Show
            rhl Robert Lupton added a comment - - edited Do we have a plan for how we will integrate C++ and python documentation? If we'll still be using doxygen for C++, is this RFC a step backwards?
            pschella Pim Schellart [X] (Inactive) made changes -
            Field Original Value New Value
            Description We'll soon (~1-2 months) have a documentation system that renders HTML documentation from Python docstrings written with reStructuredText.

            At one of the PCW hack sessions (with a large majority of non-DMLT DM members in attendance) we had unanimous agreement that we should start writing Python docstrings in this form now, instead of prefixing docstrings with a "!" to invoke Doxygen markup.

            This will temporarily make HTML documentation for Python uglier (it will be rendered as reStructuredText source), but will eventually make it nicer, and it will make on-line Python docstrings modestly more readable (because reStructuredText source is more readable than Doxygen markup).

            If other hack session work goes quickly and this RFC is accepted ahead of schedule, this could give us an opportunity to start this transition for existing code during the meeting.
            {color:#14892c}colored text{color}We'll soon (~1-2 months) have a documentation system that renders HTML documentation from Python docstrings written with reStructuredText.

            At one of the PCW hack sessions (with a large majority of non-DMLT DM members in attendance) we had unanimous agreement that we should start writing Python docstrings in this form now, instead of prefixing docstrings with a "!" to invoke Doxygen markup.

            This will temporarily make HTML documentation for Python uglier (it will be rendered as reStructuredText source), but will eventually make it nicer, and it will make on-line Python docstrings modestly more readable (because reStructuredText source is more readable than Doxygen markup).

            If other hack session work goes quickly and this RFC is accepted ahead of schedule, this could give us an opportunity to start this transition for existing code during the meeting.
            pschella Pim Schellart [X] (Inactive) made changes -
            Description {color:#14892c}colored text{color}We'll soon (~1-2 months) have a documentation system that renders HTML documentation from Python docstrings written with reStructuredText.

            At one of the PCW hack sessions (with a large majority of non-DMLT DM members in attendance) we had unanimous agreement that we should start writing Python docstrings in this form now, instead of prefixing docstrings with a "!" to invoke Doxygen markup.

            This will temporarily make HTML documentation for Python uglier (it will be rendered as reStructuredText source), but will eventually make it nicer, and it will make on-line Python docstrings modestly more readable (because reStructuredText source is more readable than Doxygen markup).

            If other hack session work goes quickly and this RFC is accepted ahead of schedule, this could give us an opportunity to start this transition for existing code during the meeting.
            We'll soon (~1-2 months) have a documentation system that renders HTML documentation from Python docstrings written with reStructuredText.

            At one of the PCW hack sessions (with a large majority of non-DMLT DM members in attendance) we had unanimous agreement that we should start writing Python docstrings in this form now, instead of prefixing docstrings with a "!" to invoke Doxygen markup.

            This will temporarily make HTML documentation for Python uglier (it will be rendered as reStructuredText source), but will eventually make it nicer, and it will make on-line Python docstrings modestly more readable (because reStructuredText source is more readable than Doxygen markup).

            If other hack session work goes quickly and this RFC is accepted ahead of schedule, this could give us an opportunity to start this transition for existing code during the meeting.
            Hide
            pschella Pim Schellart [X] (Inactive) added a comment -

            If we are switching to pybind11. I would suggest to write C++ API documentation in doxygen format, but write anything exposed to Python in ReStructuredText format in the wrapper files. We might want to use some placeholders with some kind of automatic replacement to prevent duplication there though.

            Show
            pschella Pim Schellart [X] (Inactive) added a comment - If we are switching to pybind11. I would suggest to write C++ API documentation in doxygen format, but write anything exposed to Python in ReStructuredText format in the wrapper files. We might want to use some placeholders with some kind of automatic replacement to prevent duplication there though.
            Hide
            jsick Jonathan Sick added a comment -

            Thanks for writing this RFC, Jim Bosch, and advocating this approach. I agree with this given that it will help in hastening our transition to the documentation system (Sphinx) that we as a group have been discussing for a year now. The infrastructure for building documentation from numpydoc is my current focus (pending affirmation of loading for the second half of F16). Effectively this RFC will allow us to start writing modern content a couple months in advance, rather than waiting for me to finish my F16 epics.

            I’ve described the numpydoc format in our Developer Guide at https://developer.lsst.io/docs/py_docs.html I’m happy to coach developers on this format and iterate on that documentation.

            One trick is that I haven’t implemented documentation builds for each package (either on a developer’s machine, or as part of CI). Without building documentation there’s a risk of formatting errors in the reStructuredText. We can either accept these errors and clean them up when Sphinx is integrated fully in our build servers, or I can implement prototype-quality sphinx builds for our packages. I’m in the process of demonstrating this with daf_base (DM-7095), and I think I could quickly roll this out to other packages.

            To answer Robert Lupton’s question, Doxygen is the best API documentation system that exists (to my knowledge) for C/C++. My plan (as demonstrated in a working prototype) is to use doxygen to generate XML that’s converted into native Sphinx format with breathe. This seems to work quite well. In fact, the biggest difficulty is in documenting SWIG’d APIs from a Python context. It seems that writing numpydoc in our SWIG or pybind11 files could be an effective solution to this. To be clear though, we’ll be retiring Doxygen in the sense of 1) a format for Python docstrings 2) a format for writing prose (.dox) and 3) a tool for publishing HTML.

            Show
            jsick Jonathan Sick added a comment - Thanks for writing this RFC, Jim Bosch , and advocating this approach. I agree with this given that it will help in hastening our transition to the documentation system (Sphinx) that we as a group have been discussing for a year now. The infrastructure for building documentation from numpydoc is my current focus (pending affirmation of loading for the second half of F16). Effectively this RFC will allow us to start writing modern content a couple months in advance, rather than waiting for me to finish my F16 epics. I’ve described the numpydoc format in our Developer Guide at https://developer.lsst.io/docs/py_docs.html I’m happy to coach developers on this format and iterate on that documentation. One trick is that I haven’t implemented documentation builds for each package (either on a developer’s machine, or as part of CI). Without building documentation there’s a risk of formatting errors in the reStructuredText. We can either accept these errors and clean them up when Sphinx is integrated fully in our build servers, or I can implement prototype-quality sphinx builds for our packages. I’m in the process of demonstrating this with daf_base ( DM-7095 ), and I think I could quickly roll this out to other packages. To answer Robert Lupton ’s question, Doxygen is the best API documentation system that exists (to my knowledge) for C/C++. My plan (as demonstrated in a working prototype) is to use doxygen to generate XML that’s converted into native Sphinx format with breathe . This seems to work quite well. In fact, the biggest difficulty is in documenting SWIG’d APIs from a Python context. It seems that writing numpydoc in our SWIG or pybind11 files could be an effective solution to this. To be clear though, we’ll be retiring Doxygen in the sense of 1) a format for Python docstrings 2) a format for writing prose (.dox) and 3) a tool for publishing HTML.
            Hide
            Parejkoj John Parejko added a comment -

            I approve this message. Doxygen is harder to read in-line than numpydoc, and this will help kickstart the transition.

            Show
            Parejkoj John Parejko added a comment - I approve this message. Doxygen is harder to read in-line than numpydoc, and this will help kickstart the transition.
            Hide
            rhl Robert Lupton added a comment -

            OK, sounds like this is a plan.

            Show
            rhl Robert Lupton added a comment - OK, sounds like this is a plan.
            Hide
            ktl Kian-Tat Lim added a comment -

            I'm fine with this. Can I get someone to say what actual text would be added or modified, presumably here? I'd like that to have as few LSST customizations as possible. And note that that text currently points to Trac which in turn redirects to Confluence, which is highly undesirable and needs to be superseded (as it refers to doxygen for Python).

            Show
            ktl Kian-Tat Lim added a comment - I'm fine with this. Can I get someone to say what actual text would be added or modified, presumably here ? I'd like that to have as few LSST customizations as possible. And note that that text currently points to Trac which in turn redirects to Confluence , which is highly undesirable and needs to be superseded (as it refers to doxygen for Python).
            Hide
            jsick Jonathan Sick added a comment - - edited

            Responding to Kian-Tat Lim, I’d change https://developer.lsst.io/coding/python_style_guide.html#documentation-strings in the following ways:

            1. In the introduction, state that we use PEP 257 in general, but specifically follow numpydoc, and link to our numpydoc style guide at https://developer.lsst.io/docs/py_docs.html That is, most of the technical description of numpydoc is in that latter page. This also deprecates the link to TRAC and https://confluence.lsstcorp.org/display/LDMDG/Documentation+Standards.

            2. Change the section "Docstrings SHOULD start with a 1-line imperative summary ending in a period” to “Docstring for function and methods SHOULD start with a 1-link present tense action summary sentence.” The present tense action (e.g. “Joins two tables.”) is consistent with the numpy documentation style.

            There’d be a similar section stating that Classes and modules should have docstrings with a one sentence summary of the class or module’s role. The doc style for classes and modules is different from methods and functions only because functions and methods perform actions while the classes and methods don’t.

            Of course, the https://developer.lsst.io/docs/py_docs.html will then become a “coding standard.” Does that page then need to be accepted by TCT/equivalent?

            Also, this is beside the point in this RFC, but as we deprecate the confluence page on documentation standards, the C++ documentation standards will be at https://developer.lsst.io/docs/cpp_docs.html This is the best I could write given existing documentation, but I think that page could be improved.

            Show
            jsick Jonathan Sick added a comment - - edited Responding to Kian-Tat Lim , I’d change https://developer.lsst.io/coding/python_style_guide.html#documentation-strings in the following ways: 1. In the introduction, state that we use PEP 257 in general, but specifically follow numpydoc, and link to our numpydoc style guide at https://developer.lsst.io/docs/py_docs.html That is, most of the technical description of numpydoc is in that latter page. This also deprecates the link to TRAC and https://confluence.lsstcorp.org/display/LDMDG/Documentation+Standards . 2. Change the section "Docstrings SHOULD start with a 1-line imperative summary ending in a period” to “Docstring for function and methods SHOULD start with a 1-link present tense action summary sentence.” The present tense action (e.g. “Joins two tables.”) is consistent with the numpy documentation style. There’d be a similar section stating that Classes and modules should have docstrings with a one sentence summary of the class or module’s role. The doc style for classes and modules is different from methods and functions only because functions and methods perform actions while the classes and methods don’t. Of course, the https://developer.lsst.io/docs/py_docs.html will then become a “coding standard.” Does that page then need to be accepted by TCT/equivalent? Also, this is beside the point in this RFC, but as we deprecate the confluence page on documentation standards, the C++ documentation standards will be at https://developer.lsst.io/docs/cpp_docs.html This is the best I could write given existing documentation, but I think that page could be improved.
            Hide
            ktl Kian-Tat Lim added a comment -

            If it's a coding standard, I believe I'm BDFL for it. I'll read through py_docs.html when I get a chance, but I don't see any problem right now with adopting and implementing this change.

            Show
            ktl Kian-Tat Lim added a comment - If it's a coding standard, I believe I'm BDFL for it. I'll read through py_docs.html when I get a chance, but I don't see any problem right now with adopting and implementing this change.
            jbosch Jim Bosch made changes -
            Link This issue relates to DM-7345 [ DM-7345 ]
            Hide
            jbosch Jim Bosch added a comment -

            Adopted; I think all concerns have been addressed and Kian-Tat Lim has explicitly signed off on it. I'm assigning the implementation issue (DM-7345) to Jonathan Sick.

            Show
            jbosch Jim Bosch added a comment - Adopted; I think all concerns have been addressed and Kian-Tat Lim has explicitly signed off on it. I'm assigning the implementation issue ( DM-7345 ) to Jonathan Sick .
            jbosch Jim Bosch made changes -
            Resolution Done [ 10000 ]
            Status Proposed [ 10805 ] Adopted [ 10806 ]
            tjenness Tim Jenness made changes -
            Link This issue is triggering DM-7345 [ DM-7345 ]
            tjenness Tim Jenness made changes -
            Link This issue relates to DM-7345 [ DM-7345 ]
            Hide
            rowen Russell Owen added a comment -

            Jonathan Sick said:

            The present tense action (e.g. “Joins two tables.”) is consistent with the numpy documentation style."

            The example should be "Join two tables", not "Joins two tables", as per this quote from PEP 257:

            The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...".

            Show
            rowen Russell Owen added a comment - Jonathan Sick said: The present tense action (e.g. “Joins two tables.”) is consistent with the numpy documentation style." The example should be "Join two tables", not "Joins two tables", as per this quote from PEP 257: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...".
            Hide
            jsick Jonathan Sick added a comment -

            I could swear that I read guidance about the 'summary' sentence for a method or function should be written in the present tense with the method/function name ideally as the verb. However I can't seem to find that bit of research again. I think web companies generally use that construction since it's a bit friendlier.

            On the other hand, you're right, PEP 257 and numpydoc do seem to advocate the imperative voice here.

            I'm working on a ticket to implement both this RFC and the PEP 8 RFC, and I'll do some more research on this.

            Show
            jsick Jonathan Sick added a comment - I could swear that I read guidance about the 'summary' sentence for a method or function should be written in the present tense with the method/function name ideally as the verb. However I can't seem to find that bit of research again. I think web companies generally use that construction since it's a bit friendlier. On the other hand, you're right, PEP 257 and numpydoc do seem to advocate the imperative voice here. I'm working on a ticket to implement both this RFC and the PEP 8 RFC, and I'll do some more research on this.
            Hide
            jsick Jonathan Sick added a comment -

            Yep, I was simply confused. Imperative all the way for method/function summaries.

            Show
            jsick Jonathan Sick added a comment - Yep, I was simply confused. Imperative all the way for method/function summaries.
            jsick Jonathan Sick made changes -
            Link This issue is triggering DM-5456 [ DM-5456 ]
            jsick Jonathan Sick made changes -
            Status Adopted [ 10806 ] Implemented [ 11105 ]

              People

              Assignee:
              jbosch Jim Bosch
              Reporter:
              jbosch Jim Bosch
              Watchers:
              Cindy Wang [X] (Inactive), David Shupe, Jim Bosch, John Parejko, John Swinbank, Jonathan Sick, Kian-Tat Lim, Pim Schellart [X] (Inactive), Robert Lupton, Russell Owen, Tim Jenness, Xiuqin Wu [X] (Inactive)
              Votes:
              1 Vote for this issue
              Watchers:
              12 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:
                Planned End:

                  Jenkins Builds

                  No builds found.