Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-33034

Create initial doc builder for pipeline definitions

    XMLWordPrintable

    Details

      Description

      Pipelines are defined in YAML files, rather than Python code, so we can't use our usual Sphinx-based tooling to generate documentation for them. This ticket will provide tooling (in pipe_base) and a real-world demonstration of its use (in drp_pipe) that builds rST docs from those YAML files for Sphinx to compile.

      In this first version, I'll use SCons to invoke these tools; that should ensure that the generated rST files are installed with the rest of the package when documenteer's stack-docs finds it later. It may be cleaner to call the same tooling directly from a Sphinx plugin in the future, but I'll leave that for another ticket (aside from trying to avoid tying the tools too closely to SCons).

      Initial doc content will include:

      • writing the pipeline in "expanded" form (see DM-33027), and including the resulting config and YAML files in rST pages via literalinclude;
      • writing and rendering GraphViz diagrams for the full pipeline and each task, and including those in the rST pages with image directives.

      There's a lot more we could do, but I'll create separate tickets for the rest. Everything described above has actually already been done on DM-30891 branches; this ticket exists just to split that functionality out for separate review and merge.

        Attachments

          Issue Links

            Activity

            Hide
            jbosch Jim Bosch added a comment -

            Jonathan Sick, this is a holiday review request where merging is blocked by other tickets anyway: absolutely no rush!

            So, this ticket is a follow-up to our community discussion on pipeline docs, but it's more of a minimum viable product than a sketch, as I think it will give us something that works (and can be built on; see linked tickets) until if you have some time to think about how to integrate this more cleanly with the doc build.

            Changes are in four PRs; note that many of these are sitting on top of a few other tickets that are being reviewed in parallel, so the PR base isn't always main:

            • https://github.com/lsst/pipe_base/pull/222: the Python code that builds the docs. This also contains a few rather mechanical commits, so I recommend reviewing commit-by-commit and looking at the PR description to see which ones are worth a close look (tl;dr: mostly just the last one).

            Finally, you can see a local pipelines.lsst.io build of these branches here: https://lsst.ncsa.illinois.edu//~jbosch/DM-33034/modules/lsst.drp.pipe/index.html

            I was a bit disappointed to see in that doc build how few of the PipelineTask class links work there, but it looks like that's a combination of two preexisting problems, not something specific to this ticket:
            1. Many upstream packages don't include important modules in their doc builds at all - some combination of their __all__, __init__.py, and index.rst lists need to be updated.
            2. The docs I'm building here link to the Python task classes, as if they were regular Python classes, and that works when (1) isn't in play (check out the HSC/RC2 pipeline and the jointcal task for working link). But there doesn't seem to be any linkage from that class API reference page to the special Task doc page (or maybe there isn't one in this case?). Those Task pages also seem to generally be in better shape in terms of whether they exist at all upstream, but I don't see how to link to them (and ideally I think one would link to them in the same way one would link to any Python class, and the class and Task docs would be one and the same, but maybe that's not the case).

            I'd be interested in your thoughts on how best to do (2); I think (1) is just grunt work, and I'll create a ticket for the specific problems I've spotted.

            You can also find the SCons-generated rST files that went into this build on the NCSA machines, at /project/jbosch/tkt/DM-33034/drp_pipe/doc.

            Show
            jbosch Jim Bosch added a comment - Jonathan Sick , this is a holiday review request where merging is blocked by other tickets anyway: absolutely no rush! So, this ticket is a follow-up to our community discussion on pipeline docs, but it's more of a minimum viable product than a sketch, as I think it will give us something that works (and can be built on; see linked tickets) until if you have some time to think about how to integrate this more cleanly with the doc build. Changes are in four PRs; note that many of these are sitting on top of a few other tickets that are being reviewed in parallel, so the PR base isn't always main : https://github.com/lsst/pipe_base/pull/222: the Python code that builds the docs. This also contains a few rather mechanical commits, so I recommend reviewing commit-by-commit and looking at the PR description to see which ones are worth a close look (tl;dr: mostly just the last one). https://github.com/lsst/ctrl_mpexec/pull/159: pure removals (code in this package is being moved to pipe_base in one of those mechanical commits). https://github.com/lsst/drp_pipe/pull/1: an SCons doc build script that invokes the code in pipe_base; this is the package whose content the new documentation is being built from (so far). https://github.com/lsst/pipelines_lsst_io/pull/197: just adds drp_pipe to the doc build. It will be added to lsst_distrib on DM-30891 , which is one of those reviewed-in-parallel tickets. Finally, you can see a local pipelines.lsst.io build of these branches here: https://lsst.ncsa.illinois.edu//~jbosch/DM-33034/modules/lsst.drp.pipe/index.html I was a bit disappointed to see in that doc build how few of the PipelineTask class links work there, but it looks like that's a combination of two preexisting problems, not something specific to this ticket: 1. Many upstream packages don't include important modules in their doc builds at all - some combination of their __all__ , __init__.py , and index.rst lists need to be updated. 2. The docs I'm building here link to the Python task classes, as if they were regular Python classes, and that works when (1) isn't in play (check out the HSC/RC2 pipeline and the jointcal task for working link). But there doesn't seem to be any linkage from that class API reference page to the special Task doc page (or maybe there isn't one in this case?). Those Task pages also seem to generally be in better shape in terms of whether they exist at all upstream, but I don't see how to link to them (and ideally I think one would link to them in the same way one would link to any Python class, and the class and Task docs would be one and the same, but maybe that's not the case). I'd be interested in your thoughts on how best to do (2); I think (1) is just grunt work, and I'll create a ticket for the specific problems I've spotted. You can also find the SCons-generated rST files that went into this build on the NCSA machines, at /project/jbosch/tkt/ DM-33034 /drp_pipe/doc.
            Hide
            jsick Jonathan Sick added a comment -

            2. Those Task pages also seem to generally be in better shape in terms of whether they exist at all upstream, but I don't see how to link to them (and ideally I think one would link to them in the same way one would link to any Python class, and the class and Task docs would be one and the same, but maybe that's not the case).

            I'll dig into the code next, but are you aware of the extensions we have for linking to task documentation pages (and presumably that I'll also be creating for PipelineTask documentation pages)? See https://documenteer.lsst.io/sphinx-extensions/lssttasks.html#cross-reference-roles (:lsst-task:, :lsst-config:, :lsst-config-field.

            You're right that in the task reference documentation we want to be linking to our task reference pages, rather directly to the API docs for tasks, because we're designing these task reference pages to be more informative for task users.

            Show
            jsick Jonathan Sick added a comment - 2. Those Task pages also seem to generally be in better shape in terms of whether they exist at all upstream, but I don't see how to link to them (and ideally I think one would link to them in the same way one would link to any Python class, and the class and Task docs would be one and the same, but maybe that's not the case). I'll dig into the code next, but are you aware of the extensions we have for linking to task documentation pages (and presumably that I'll also be creating for PipelineTask documentation pages)? See https://documenteer.lsst.io/sphinx-extensions/lssttasks.html#cross-reference-roles (:lsst-task:, :lsst-config:, :lsst-config-field . You're right that in the task reference documentation we want to be linking to our task reference pages, rather directly to the API docs for tasks, because we're designing these task reference pages to be more informative for task users.
            Hide
            Parejkoj John Parejko added a comment -

            Checking on this, as a related ticket came up while I was looking for things to pair code on. Is there anything others can do to move it forward?

            Show
            Parejkoj John Parejko added a comment - Checking on this, as a related ticket came up while I was looking for things to pair code on. Is there anything others can do to move it forward?
            Hide
            jbosch Jim Bosch added a comment - - edited

            Yeah, sorry this got stalled. The problem is that DM-33027 got stalled when the reviewer noted that all of the new functionality should really go into a new class, and the priority for me of getting that done dropped precipitously after I got DM-30891 done (while using the DM-33027 functionality on a branch to check that that refactoring didn't change any definitions).

            So, getting this done the way I'd like would require some design work that would be hard for me to transfer to somebody else, and while it's not that far down my priority list, the thing ahead of it is really quite a big one (DM-31725).

            If someone else was sufficiently motivated, they could modify the DM-33027 to make it really clear that everything new there was a provisional API intended (for now) just for docs, and totally subject to change without notice in the future - maybe even adding some leading underscores. Then we could merge that without the additional design work and get back to this one.

            Another way to make progress (albeit with a longer-term merge target for this one) would be for someone else to start taking a look at Jonathan Sick's review comments, which I've barely glanced at (sorry, Jonathan!) since the upstream blocker's appearance made me realize I had to get that done first. Given how I've kind of hacked things together here to use the devil I knew (SCons) instead of what's probably the right tool for the job (sphinx extensions), there may be some very useful and substantial work to derive from that review.

            Show
            jbosch Jim Bosch added a comment - - edited Yeah, sorry this got stalled. The problem is that DM-33027 got stalled when the reviewer noted that all of the new functionality should really go into a new class, and the priority for me of getting that done dropped precipitously after I got DM-30891 done (while using the DM-33027 functionality on a branch to check that that refactoring didn't change any definitions). So, getting this done the way I'd like would require some design work that would be hard for me to transfer to somebody else, and while it's not that far down my priority list, the thing ahead of it is really quite a big one ( DM-31725 ). If someone else was sufficiently motivated, they could modify the DM-33027 to make it really clear that everything new there was a provisional API intended (for now) just for docs, and totally subject to change without notice in the future - maybe even adding some leading underscores. Then we could merge that without the additional design work and get back to this one. Another way to make progress (albeit with a longer-term merge target for this one) would be for someone else to start taking a look at Jonathan Sick 's review comments, which I've barely glanced at (sorry, Jonathan!) since the upstream blocker's appearance made me realize I had to get that done first. Given how I've kind of hacked things together here to use the devil I knew (SCons) instead of what's probably the right tool for the job (sphinx extensions), there may be some very useful and substantial work to derive from that review.
            Hide
            jsick Jonathan Sick added a comment -

            I also just want to mention that if you want to start merging this into production as a sort of beta/prototype without a detailed code review from me, I'm totally fine with that. I generally like where you're going with this. I think I can use the APIs you've got here to drive the page generation from the Sphinx build rather than scons using an extension similar to automodapi. I've been stuck into a project for the RSP automated notebook execution since last November, but that seems to be wrapping up in the next month, so I'll be switching from RSP back to user docs and I can really engage with this sort of thing.

            I'm also musing about whether a second type of documentation for pipelines (as in, additional to static Sphinx docs) could be an interactive app hosted on the RSP. Basically, you could open an existing pipeline YAML or drop in a new one and see the graph rendered out. I've been thinking about this because on the RSP we can ship a richer JavaScript-driven experience so we could collapse nodes, zoom in, click on task nodes to see docs etc. Could be really exciting!

            Show
            jsick Jonathan Sick added a comment - I also just want to mention that if you want to start merging this into production as a sort of beta/prototype without a detailed code review from me, I'm totally fine with that. I generally like where you're going with this. I think I can use the APIs you've got here to drive the page generation from the Sphinx build rather than scons using an extension similar to automodapi . I've been stuck into a project for the RSP automated notebook execution since last November, but that seems to be wrapping up in the next month, so I'll be switching from RSP back to user docs and I can really engage with this sort of thing. I'm also musing about whether a second type of documentation for pipelines (as in, additional to static Sphinx docs) could be an interactive app hosted on the RSP. Basically, you could open an existing pipeline YAML or drop in a new one and see the graph rendered out. I've been thinking about this because on the RSP we can ship a richer JavaScript-driven experience so we could collapse nodes, zoom in, click on task nodes to see docs etc. Could be really exciting!

              People

              Assignee:
              jbosch Jim Bosch
              Reporter:
              jbosch Jim Bosch
              Reviewers:
              Jonathan Sick
              Watchers:
              Jim Bosch, John Parejko, Jonathan Sick
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:

                  Jenkins

                  No builds found.