Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-927

Provide Task documentation for CmdLineTask

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      See Summary

        Attachments

          Issue Links

            Activity

            Hide
            rowen Russell Owen added a comment - - edited

            I think CmdLineTask should not be documented like a normal task, because it is a base class that cannot be run as a normal task. However, I used this ticket to improve the doc strings for all classes and free functions in pipe_base. The new code is in pipe_base tickets/DM-927. Here is a summary of changes:

            $ git diff --stat origin/master...origin/tickets/DM-927
             python/lsst/pipe/base/argumentParser.py |  29 ++++++++++++---------
             python/lsst/pipe/base/cmdLineTask.py    | 199 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------------------------------------------------
             python/lsst/pipe/base/struct.py         |  34 ++++++++++++++++++++-----
             python/lsst/pipe/base/task.py           | 166 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------------------------------
             python/lsst/pipe/base/timer.py          |  48 +++++++++++++++++++----------------
             5 files changed, 314 insertions(+), 162 deletions(-)

            Show
            rowen Russell Owen added a comment - - edited I think CmdLineTask should not be documented like a normal task, because it is a base class that cannot be run as a normal task. However, I used this ticket to improve the doc strings for all classes and free functions in pipe_base. The new code is in pipe_base tickets/ DM-927 . Here is a summary of changes: $ git diff --stat origin/master...origin/tickets/DM-927 python/lsst/pipe/base/argumentParser.py | 29 ++++++++++++--------- python/lsst/pipe/base/cmdLineTask.py | 199 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------------------------------------------- python/lsst/pipe/base/struct.py | 34 ++++++++++++++++++++----- python/lsst/pipe/base/task.py | 166 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------------------------------------- python/lsst/pipe/base/timer.py | 48 +++++++++++++++++++---------------- 5 files changed, 314 insertions(+), 162 deletions(-)
            Hide
            price Paul Price added a comment -

            I've only looked at the diff. Let me know if I should look at the final product as well.

            I have a lot of comments because there's a lot of additions to review, and because I've been a bit picky given the current push on clear docs. This has definitely been a worthwhile effort, so please don't be discouraged by the number of review comments.

            • TaskRunner:
              • The description mentions a "command-line task" --> CmdLineTask (at the least, capitalise Task). In general, say "Task" instead of "task", "TaskRunner" instead of "task runner", etc., in order to reinforce the paradigms (and hopefully doxygen will cross-link).
              • Can you cross-link the "how to write a command-line task"?
              • "Task's RunnerClass": be explicit that this is a class variable.
              • In getTargetList: "the parsed command object" --> "the argparse.Namespace object".
              • Line 251 is over-length.
              • TaskRunner.makeTask is confusing. Why does it need to have two different ways of calling it? If that is essential, wouldn't it be better to have "makeTaskFromArgs" and "makeTaskFromParsedCmd"?
            • CmdLineTask:
              • Be explicit that the ConfigClass, etc., are class attributes.
              • Can we get rid of the leading underscore on _DefaultName? And why is it capitalised when it's not a class?
              • Explain what each of the class attributes are used for, so if someone is having problems they might be able to link back to a missing or bad one of them.
              • Add a note on parseAndRun that this is the usual entry-point for a CmdLineTask, and give an example bin script.
              • Misspelt "parser" as "parer".
              • Add a note in parseAndRun that anything returned by run must be picklable to work with multiprocessing.
              • Can we get rid of the leading underscore on _makeArgumentParser? Might it be better to have a ParserClass attribute of CmdLineTask that we construct, in the same way as we do with the RunnerClass?
              • Doc on writeConfig and writeSchemas is a bit too verbose: first the param clobber explains how everything operates, and then there's a @throw explaining it over again.
              • Trailing backslash on line 462 is unnecessary.
              • writeSchemas says that it's written "to dataset type self._getConfigName()", which is a cut/paste error.
              • writeSchemas should be re-written to remove the warning. Stuff anything that needs to be saved into a list and write it after the validation.
              • Out of curiosity, why are you going for %r instead of %s?
              • Line 493: put the operator on the previous line — break after an operator, not before.
            • Struct:
              • Explain why you would use Struct instead of a dict or some other object.
              • Encourage people to inherit from Struct for their own containers (in particular, it provides a much better __repr__ than object.
              • __safeAdd: being too verbose by explaining the "except" twice.
            • Task:
              • Can you cross-link the "How to Write a Task"?
              • getSchemaCatalogs: too much repitition, "Return the schemas..." then "@return a dict". Add a note that this is what subclasses will normally override (rather than getAllSchemaCatalogs)?
              • getFullMetadata: "topLeveltTaskName" has an extra "t".
              • getFullName and getName aren't very pythonic.
              • You have previously (e.g., in CmdLineTask) used @param foo this is a foo and now are using @param[in] foo: this is a foo.
              • timer:
                • You refer to pexLog.Log; shouldn't this use the full dotted namespace to ensure linking?
                • "someCodeToTime" isn't a good example — am I supposed to put code there, or a name?
              • display:
                • I suggest deprecating the display method — it's not the all-purpose displaying method I was hoping to create, and we're looking for a better solution.
                • You mention lsst.detection.SourceSet, which I've never heard of. Probably you missed afw, and were referring to a class that has been removed.
                • lsst.afw.Coord should be lsst.afw.coord.Coord
                • WCS --> Wcs.
                • "sources matches" --> "source matches"; why not specify a class (lsst.afw.table.ReferenceMatch), as you do for other arguments?
                • What kind of exception is thrown?
            • timer:
              • You've switched doxygen styles again here; now it's @param foo <lots of spaces> this is a foo
              • In logPairs, state that a suitable obj would be a Task (principle user).
              • In logInfo, prefix isn't clarified by the example given — what is the "prefix" part of <funcName>Start? Is it <funcName> or <funcName>Start? And what gets appended to this?
              • In logInfo you refer to pexLog.Log — spell it out (as you do elsewhere).
            Show
            price Paul Price added a comment - I've only looked at the diff. Let me know if I should look at the final product as well. I have a lot of comments because there's a lot of additions to review, and because I've been a bit picky given the current push on clear docs. This has definitely been a worthwhile effort, so please don't be discouraged by the number of review comments. TaskRunner : The description mentions a "command-line task" --> CmdLineTask (at the least, capitalise Task ). In general, say "Task" instead of "task", "TaskRunner" instead of "task runner", etc., in order to reinforce the paradigms (and hopefully doxygen will cross-link). Can you cross-link the "how to write a command-line task"? "Task's RunnerClass": be explicit that this is a class variable. In getTargetList : "the parsed command object" --> "the argparse.Namespace object". Line 251 is over-length. TaskRunner.makeTask is confusing. Why does it need to have two different ways of calling it? If that is essential, wouldn't it be better to have "makeTaskFromArgs" and "makeTaskFromParsedCmd"? CmdLineTask : Be explicit that the ConfigClass , etc., are class attributes. Can we get rid of the leading underscore on _DefaultName ? And why is it capitalised when it's not a class? Explain what each of the class attributes are used for, so if someone is having problems they might be able to link back to a missing or bad one of them. Add a note on parseAndRun that this is the usual entry-point for a CmdLineTask , and give an example bin script. Misspelt "parser" as "parer". Add a note in parseAndRun that anything returned by run must be picklable to work with multiprocessing. Can we get rid of the leading underscore on _makeArgumentParser ? Might it be better to have a ParserClass attribute of CmdLineTask that we construct, in the same way as we do with the RunnerClass ? Doc on writeConfig and writeSchemas is a bit too verbose: first the param clobber explains how everything operates, and then there's a @throw explaining it over again. Trailing backslash on line 462 is unnecessary. writeSchemas says that it's written "to dataset type self._getConfigName()", which is a cut/paste error. writeSchemas should be re-written to remove the warning. Stuff anything that needs to be saved into a list and write it after the validation. Out of curiosity, why are you going for %r instead of %s ? Line 493: put the operator on the previous line — break after an operator, not before. Struct : Explain why you would use Struct instead of a dict or some other object . Encourage people to inherit from Struct for their own containers (in particular, it provides a much better __repr__ than object . __safeAdd : being too verbose by explaining the "except" twice. Task : Can you cross-link the "How to Write a Task"? getSchemaCatalogs : too much repitition, "Return the schemas..." then "@return a dict". Add a note that this is what subclasses will normally override (rather than getAllSchemaCatalogs )? getFullMetadata : "topLeveltTaskName" has an extra "t". getFullName and getName aren't very pythonic. You have previously (e.g., in CmdLineTask ) used @param foo this is a foo and now are using @param[in] foo: this is a foo . timer : You refer to pexLog.Log ; shouldn't this use the full dotted namespace to ensure linking? "someCodeToTime" isn't a good example — am I supposed to put code there, or a name? display : I suggest deprecating the display method — it's not the all-purpose displaying method I was hoping to create, and we're looking for a better solution. You mention lsst.detection.SourceSet , which I've never heard of. Probably you missed afw , and were referring to a class that has been removed. lsst.afw.Coord should be lsst.afw.coord.Coord WCS --> Wcs . "sources matches" --> "source matches"; why not specify a class (lsst.afw.table.ReferenceMatch), as you do for other arguments? What kind of exception is thrown? timer : You've switched doxygen styles again here; now it's @param foo <lots of spaces> this is a foo In logPairs , state that a suitable obj would be a Task (principle user). In logInfo , prefix isn't clarified by the example given — what is the "prefix" part of <funcName>Start ? Is it <funcName> or <funcName>Start ? And what gets appended to this? In logInfo you refer to pexLog.Log — spell it out (as you do elsewhere).
            Hide
            rowen Russell Owen added a comment -

            Regarding links to documentation in pipe_tasks: I would love to be able to do this, but I don't know how to do it (at least without a lot of doxygen errors and only the hope that the links may eventually work when all the documentation is cross-linked).

            It is a strong argument in favor of moving "how to write a task" and "how to write a command-line task" into pipe_base. The problem with doing that is that this documentation can usefully link to packages that pipe_base does not depend on, so it just shifts the problem around.

            I'll get to work on your other suggestions.

            Show
            rowen Russell Owen added a comment - Regarding links to documentation in pipe_tasks: I would love to be able to do this, but I don't know how to do it (at least without a lot of doxygen errors and only the hope that the links may eventually work when all the documentation is cross-linked). It is a strong argument in favor of moving "how to write a task" and "how to write a command-line task" into pipe_base. The problem with doing that is that this documentation can usefully link to packages that pipe_base does not depend on, so it just shifts the problem around. I'll get to work on your other suggestions.
            Hide
            rowen Russell Owen added a comment -

            I applied your suggested changes with the following exceptions and comments:

            • I relented on cross-linking slightly by adding links to Task and CmdLineTask.
              There will be 3 broken links (until the documentation is cross-referenced,
              at which point I hope they start to work). I added a note as to where to look
              if the links are broken, since that information is no longer self-evident.
            • TaskRunner.makeTask is confusing: but agree, but I don't think it should be changed;
              if it needs to be overridden I'd rather do it in one place than two.
              Fortunately it almost never needs to be touched.
            • CmdLineTask._DefaultName name complaints: I usually name class constants with leading uppercase,
              but then `canMultiprocess` is the odd man out. Changing the case of one of thes and losing
              the leading underscore for _DefaultName makes sense, but should be a new ticket.
            • CmdLineTask: Explain what each of the class attributes are used for: I already do this
              in CmdLineTask's class doc strong. If what is there is not sufficient, please explain.
            • "Add a note in parseAndRun that anything returned by run must be picklable...":
              good idea, but it put it in CmdLineTask's class doc string, instead.
            • "Trailing backslash on line 462 is unnecessary": true, but I think it adds clarity.
            • "Can we get rid of the leading underscore on _makeArgumentParser...":
              it is a protected method, not a part of the public API but something users should not call.
              As such, I believe the leading underscore is well justified. However, I agree that it is
              a pain from a doxygen perspective, since by default doxygen does not show the documentation.
              I'm just not willing to let doxygen quirks interfere with what I consider to be good coding practice.
              I have handled the problem in "how to write" manuals, and I think it suffices.
              I could also modify doxygen config to show hidden methods. That might be wise,
              but I'm a bit worried about what may be shown that is private.
            • why %r instead of %s?: I assume you are referring to formatting the error message in writeConfig and
              writeSchemas? I use %r to put the string in quotes. (It also provides more useful information
              if the config name is not a string, but that's not the main purpose).
            • "Encourage people to inherit from Struct for their own containers": I don't understand why this is useful;
              I certainly didn't design the class with inheritance in mind.
            • "__safeAdd: being too verbose by explaining the "except" twice." I agree, but I don't see an easy way out.
            • formatting of @param: In the past I always used `@param[in] foo: description of foo`, but it turns out
              Doxygen doesn't like this, so I've decided to stop using a separator (since every one I've tried
              results in ugly html). Unfortunately, I find that a single space makes the source code hard to read,
              so I use multiple spaces to a suitable 4-char "tab stop" to line them all up. As you found, I had not yet
              switched all the code; I've now done that.
            • "What kind of exception is thrown?" by Task.display: I have no idea, but it's a subclass of Exception,
              which is good enough for the documentation. I think the way this case is handled is a bug or misfeature,
              but since the method is deprecated it doesn't really matter.
            Show
            rowen Russell Owen added a comment - I applied your suggested changes with the following exceptions and comments: I relented on cross-linking slightly by adding links to Task and CmdLineTask. There will be 3 broken links (until the documentation is cross-referenced, at which point I hope they start to work). I added a note as to where to look if the links are broken, since that information is no longer self-evident. TaskRunner.makeTask is confusing: but agree, but I don't think it should be changed; if it needs to be overridden I'd rather do it in one place than two. Fortunately it almost never needs to be touched. CmdLineTask._DefaultName name complaints: I usually name class constants with leading uppercase, but then `canMultiprocess` is the odd man out. Changing the case of one of thes and losing the leading underscore for _DefaultName makes sense, but should be a new ticket. CmdLineTask: Explain what each of the class attributes are used for: I already do this in CmdLineTask's class doc strong. If what is there is not sufficient, please explain. "Add a note in parseAndRun that anything returned by run must be picklable...": good idea, but it put it in CmdLineTask's class doc string, instead. "Trailing backslash on line 462 is unnecessary": true, but I think it adds clarity. "Can we get rid of the leading underscore on _makeArgumentParser...": it is a protected method, not a part of the public API but something users should not call. As such, I believe the leading underscore is well justified. However, I agree that it is a pain from a doxygen perspective, since by default doxygen does not show the documentation. I'm just not willing to let doxygen quirks interfere with what I consider to be good coding practice. I have handled the problem in "how to write" manuals, and I think it suffices. I could also modify doxygen config to show hidden methods. That might be wise, but I'm a bit worried about what may be shown that is private. why %r instead of %s?: I assume you are referring to formatting the error message in writeConfig and writeSchemas? I use %r to put the string in quotes. (It also provides more useful information if the config name is not a string, but that's not the main purpose). "Encourage people to inherit from Struct for their own containers": I don't understand why this is useful; I certainly didn't design the class with inheritance in mind. "__safeAdd: being too verbose by explaining the "except" twice." I agree, but I don't see an easy way out. formatting of @param: In the past I always used `@param [in] foo: description of foo`, but it turns out Doxygen doesn't like this, so I've decided to stop using a separator (since every one I've tried results in ugly html). Unfortunately, I find that a single space makes the source code hard to read, so I use multiple spaces to a suitable 4-char "tab stop" to line them all up. As you found, I had not yet switched all the code; I've now done that. "What kind of exception is thrown?" by Task.display: I have no idea, but it's a subclass of Exception, which is good enough for the documentation. I think the way this case is handled is a bug or misfeature, but since the method is deprecated it doesn't really matter.

              People

              Assignee:
              rowen Russell Owen
              Reporter:
              rhl Robert Lupton
              Reviewers:
              Paul Price
              Watchers:
              Paul Price, Robert Lupton, Russell Owen
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins Builds

                  No builds found.