Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-25385

begin pipetask command conversion to Click, implement the 'build' subcommand.

    XMLWordPrintable

    Details

    • Story Points:
      24
    • Sprint:
      DB_F20_06
    • Team:
      Data Access and Database
    • Urgent?:
      No

      Description

      begin conversion of the existing pipetask command in ctrl_mpexec to the click framework, based on or similar to the butler command.

      Implement the build subcommand. My goal is to rewrite/refactor as little of the existing implementation code as possible. There is room for improvements (e.g. don't propagate an 'args' container), but it would be better to fix this after the argparse framework is removed so we don't have to work with parallel implementations.

        Attachments

          Issue Links

            Activity

            Hide
            npease Nate Pease [X] (Inactive) added a comment -

            I added a command to ctrl_mpexec called pipetask2, once we're ready to switch we can delete the existing pipetask command and related infrastructure, and rename the command, removing the '2'.

            the build subcommand is implemented, perhaps Andy Salnikov should have a look?

            The commit called "convert unit tests to mixins" is a pretty substantial change to a mixin-based test strategy. Earlier commits in that project make it so the butler command loader  can be reused with pipetask. Everything else is minor cleanups, and implementation of shared options for pipetask2.

            Show
            npease Nate Pease [X] (Inactive) added a comment - I added a command to ctrl_mpexec called pipetask2 , once we're ready to switch we can delete the existing pipetask command and related infrastructure, and rename the command, removing the '2'. the build subcommand is implemented, perhaps Andy Salnikov should have a look? The commit called "convert unit tests to mixins" is a pretty substantial change to a mixin-based test strategy. Earlier commits in that project make it so the butler  command loader  can be reused with pipetask. Everything else is minor cleanups, and implementation of shared options for pipetask2 .
            Hide
            tjenness Tim Jenness added a comment -

            So this ticket is an initial conversion with the build command only? Can you make the ticket title more explicit about that please?

            Show
            tjenness Tim Jenness added a comment - So this ticket is an initial conversion with the build command only? Can you make the ticket title more explicit about that please?
            Hide
            npease Nate Pease [X] (Inactive) added a comment -

            Yes. Changed.

            New help text is 

            $ pipetask2 build --help
            Usage: pipetask2 build [OPTIONS]
             
             
              Build and optionally save pipeline definition.
             
             
              This does not require input data to be specified.
             
             
            Options:
              -t, --task TEXT           Task name to add to pipeline, must be a fully
                                        qualified task name. Task name can be followed by
                                        colon and label name, if label is not given than
                                        task base name (class name) is used as label.
              --show TEXT               Dump various info to standard output. Possible
                                        items are: `config', `config=[Task::]<PATTERN>' or
                                        `config=[Task::]<PATTERN>:NOIGNORECASE' to dump
                                        configuration fields possibly matching given
                                        pattern and/or task label; `history=<FIELD>' to
                                        dump configuration history for a field, field name
                                        is specified as [Task::][SubTask.]Field; `dump-
                                        config', `dump-config=Task' to dump complete
                                        configuration for a task given its label or all
                                        tasks; `pipeline' to show pipeline composition;
                                        `graph' to show information about quanta;
                                        `workflow' to show information about quanta and
                                        their dependency; `tasks' to show task
                                        composition.
              -s, --save-pipeline TEXT  Location for storing resulting pipeline definition
                                        in YAML format.
              -p, --pipeline TEXT       Location of a pipeline definition file in YAML
                                        format.
              --pipeline-dot PATH       Location for storing GraphViz DOT representation
                                        of a pipeline.
              --order-pipeline TEXT     Order tasks in pipeline based on their data
                                        dependencies, ordering is performed as last step
                                        before saving or executing pipeline.
              -i, --instrument TEXT     Add an instrument which will be used to load
                                        config overrides when defining a pipeline. This
                                        must be the fully qualified class name.
              --delete TEXT             Delete task with given label from pipeline.
              -c, --config TEXT         Config override, as a key-value pair.
              -C, --config-file TEXT    Path to a pex config override to be included after
                                        the Instrument config overrides are applied.
              -h, --help                Show this message and exit. 

            The existing pipetask command provided some guidance in the option type meta info e.g. --show ITEM|ITEM=VALUE instead of –show TEXT. Do we want to preserve these annotations?

            Also, the pipetask help is divided into sections, I'll include a complete example for build below. If we can't separate options into sections are we going to be ok with that?

            $ pipetask qgraph --help
            usage: pipetask qgraph [-h] [-L LEVEL|COMPONENT=LEVEL] [--longlog] [--debug]
                                   [-p PATH] [-t TASK[:LABEL]] [--delete LABEL]
                                   [-c LABEL:NAME=VALUE] [-C LABEL:PATH]
                                   [--order-pipeline] [-s PATH] [--pipeline-dot PATH]
                                   [--instrument instrument] [-g PATH] [--skip-existing]
                                   [-q PATH] [--save-single-quanta PATH]
                                   [--qgraph-dot PATH] [-b PATH] [-i COLL,DSTYPE:COLL]
                                   [-o COLL] [--output-run COLL]
                                   [--extend-run | --replace-run]
                                   [--prune-replaced {unstore,purge}] [-d QUERY]
                                   [--show ITEM|ITEM=VALUE]
             
            Build and optionally save pipeline and quantum graph.
             
            optional arguments:
              -h, --help            show this help message and exit
              --show ITEM|ITEM=VALUE
                                    Dump various info to standard output. Possible items
                                    are: `config', `config=[Task::]<PATTERN>' or
                                    `config=[Task::]<PATTERN>:NOIGNORECASE' to dump
                                    configuration fields possibly matching given pattern
                                    and/or task label; `history=<FIELD>' to dump
                                    configuration history for a field, field name is
                                    specified as [Task::][SubTask.]Field; `dump-config',
                                    `dump-config=Task' to dump complete configuration for
                                    a task given its label or all tasks; `pipeline' to
                                    show pipeline composition; `graph' to show information
                                    about quanta; `workflow' to show information about
                                    quanta and their dependency; `tasks' to show task
                                    composition.
             
            Logging options:
              -L LEVEL|COMPONENT=LEVEL, --loglevel LEVEL|COMPONENT=LEVEL
                                    logging level; supported levels are
                                    [trace|debug|info|warn|error|fatal]
              --longlog             use a more verbose format for the logging
              --debug               enable debugging output using lsstDebug facility
                                    (imports debug.py)
             
            Pipeline building options:
              -p PATH, --pipeline PATH
                                    Location of a pipeline definition file in YAML format.
              -t TASK[:LABEL], --task TASK[:LABEL]
                                    Task name to add to pipeline, must be a fully
                                    qualified task name. Task name can be followed by
                                    colon and label name, if label is not given than task
                                    base name (class name) is used as label.
              --delete LABEL        Delete task with given label from pipeline.
              -c LABEL:NAME=VALUE, --config LABEL:NAME=VALUE
                                    Configuration override(s) for a task with specified
                                    label, e.g. -c task:foo=newfoo -c task:bar.baz=3.
              -C LABEL:PATH, --configfile LABEL:PATH
                                    Configuration override file(s), applies to a task with
                                    a given label.
              --order-pipeline      Order tasks in pipeline based on their data
                                    dependencies, ordering is performed as last step
                                    before saving or executing pipeline.
              -s PATH, --save-pipeline PATH
                                    Location for storing resulting pipeline definition in
                                    YAML format.
              --pipeline-dot PATH   Location for storing GraphViz DOT representation of a
                                    pipeline.
              --instrument instrument
                                    Add an instrument which will be used to load config
                                    overrides when defining a pipeline. This must be the
                                    fully qualified class name
             
            Quantum graph building options:
              -g PATH, --qgraph PATH
                                    Location for a serialized quantum graph definition
                                    (pickle file). If this option is given then all input
                                    data options and pipeline-building options cannot be
                                    used.
              --skip-existing       If all Quantum outputs already exist in the output RUN
                                    collection then that Quantum will be excluded from the
                                    QuantumGraph. Requires --extend-run.
              -q PATH, --save-qgraph PATH
                                    Location for storing a serialized quantum graph
                                    definition (pickle file).
              --save-single-quanta PATH
                                    Format string of locations for storing individual
                                    quantum graph definition (pickle files). The curly
                                    brace {} in the input string will be replaced by a
                                    quantum number.
              --qgraph-dot PATH     Location for storing GraphViz DOT representation of a
                                    quantum graph.
             
            Data repository and selection options:
              -b PATH, --butler-config PATH
                                    Location of the gen3 butler/registry config file.
              -i COLL,DSTYPE:COLL, --input COLL,DSTYPE:COLL
                                    Comma-separated names of the input collection(s). Any
                                    entry includes a colon (:), the first string is a
                                    dataset type name that restricts the search in that
                                    collection. May be passed multiple times (all
                                    arguments are concatenated).
              -o COLL, --output COLL
                                    Name of the output CHAINED collection. This may either
                                    be an existing CHAINED collection to use as both input
                                    and output (incompatible with --input), or a new
                                    CHAINED collection created to include all inputs
                                    (requires --input). In both cases, the collection's
                                    children will start with an output RUN collection that
                                    directly holds all new datasets (see --output-run).
              --output-run COLL     Name of the new output RUN collection. If not
                                    provided, --output must be, a new RUN collection will
                                    be created by appending a timestamp to the value
                                    passed with --output. If this collection already
                                    exists, --extend-run must be passed.
              --extend-run          Instead of creating a new RUN collection, insert
                                    datasets into either the one given by --output-run (if
                                    provided) or the first child collection of --output
                                    (which must be of type RUN).
              --replace-run         Before creating a new RUN collection in an existing
                                    CHAINED collection, remove the first child collection
                                    (which must be of type RUN). This can be used to
                                    repeatedly write to the same (parent) collection
                                    during development, but it does not delete the
                                    datasets associated with the replaced run unless
                                    --prune-replaced is also passed. Requires --output,
                                    and incompatible with --extend-run.
              --prune-replaced {unstore,purge}
                                    Delete the datasets in the collection replaced by
                                    --replace-run, either just from the datastore
                                    ('unstore') or by removing them and the RUN completely
                                    ('purge'). Requires --replace-run.
              -d QUERY, --data-query QUERY
                                    User data selection expression.
            Notes:
              * many options can appear multiple times; all values are used, in order
                left to right
              * @file reads command-line options from the specified file:
                * data may be distributed among multiple lines (e.g. one option per line)
                * data after # is treated as a comment and ignored
                * blank lines and lines starting with # are ignored
             
             

             

             

            Show
            npease Nate Pease [X] (Inactive) added a comment - Yes. Changed. New help text is  $ pipetask2 build --help Usage: pipetask2 build [OPTIONS]       Build and optionally save pipeline definition.       This does not require input data to be specified.     Options:   -t, --task TEXT           Task name to add to pipeline, must be a fully                             qualified task name. Task name can be followed by                             colon and label name, if label is not given than                             task base name ( class name) is used as label.   --show TEXT               Dump various info to standard output. Possible                             items are: `config ', `config=[Task::]<PATTERN>' or                             `config=[Task::]<PATTERN>:NOIGNORECASE' to dump                             configuration fields possibly matching given                             pattern and/or task label; `history=<FIELD>' to                             dump configuration history for a field, field name                             is specified as [Task::][SubTask.]Field; `dump-                             config ', `dump-config=Task' to dump complete                             configuration for a task given its label or all                             tasks; `pipeline' to show pipeline composition;                             `graph' to show information about quanta;                             `workflow' to show information about quanta and                             their dependency; `tasks' to show task                             composition.   -s, --save-pipeline TEXT  Location for storing resulting pipeline definition                             in YAML format.   -p, --pipeline TEXT       Location of a pipeline definition file in YAML                             format.   --pipeline-dot PATH       Location for storing GraphViz DOT representation                             of a pipeline.   --order-pipeline TEXT     Order tasks in pipeline based on their data                             dependencies, ordering is performed as last step                             before saving or executing pipeline.   -i, --instrument TEXT     Add an instrument which will be used to load                             config overrides when defining a pipeline. This                             must be the fully qualified class name.   --delete TEXT             Delete task with given label from pipeline.   -c, --config TEXT         Config override, as a key-value pair.   -C, --config-file TEXT    Path to a pex config override to be included after                             the Instrument config overrides are applied.   -h, --help                Show this message and exit. The existing pipetask command provided some guidance in the option type meta info e.g. --show ITEM|ITEM=VALUE instead of –show TEXT . Do we want to preserve these annotations? Also, the pipetask help is divided into sections, I'll include a complete example for build below. If we can't separate options into sections are we going to be ok with that? $ pipetask qgraph --help usage: pipetask qgraph [-h] [-L LEVEL|COMPONENT=LEVEL] [--longlog] [--debug] [-p PATH] [-t TASK[:LABEL]] [--delete LABEL] [-c LABEL:NAME=VALUE] [-C LABEL:PATH] [--order-pipeline] [-s PATH] [--pipeline-dot PATH] [--instrument instrument] [-g PATH] [--skip-existing] [-q PATH] [--save-single-quanta PATH] [--qgraph-dot PATH] [-b PATH] [-i COLL,DSTYPE:COLL] [-o COLL] [--output-run COLL] [--extend-run | --replace-run] [--prune-replaced {unstore,purge}] [-d QUERY] [--show ITEM|ITEM=VALUE]   Build and optionally save pipeline and quantum graph.   optional arguments: -h, --help show this help message and exit --show ITEM|ITEM=VALUE Dump various info to standard output. Possible items are: `config ', `config=[Task::]<PATTERN>' or `config=[Task::]<PATTERN>:NOIGNORECASE' to dump configuration fields possibly matching given pattern and/or task label; `history=<FIELD>' to dump configuration history for a field, field name is specified as [Task::][SubTask.]Field; `dump-config', `dump-config=Task' to dump complete configuration for a task given its label or all tasks; `pipeline' to show pipeline composition; `graph' to show information about quanta; `workflow' to show information about quanta and their dependency; `tasks' to show task composition.   Logging options: -L LEVEL|COMPONENT=LEVEL, --loglevel LEVEL|COMPONENT=LEVEL logging level; supported levels are [trace|debug|info|warn|error|fatal] --longlog use a more verbose format for the logging --debug enable debugging output using lsstDebug facility (imports debug.py)   Pipeline building options: -p PATH, --pipeline PATH Location of a pipeline definition file in YAML format. -t TASK[:LABEL], --task TASK[:LABEL] Task name to add to pipeline, must be a fully qualified task name. Task name can be followed by colon and label name, if label is not given than task base name ( class name) is used as label. --delete LABEL Delete task with given label from pipeline. -c LABEL:NAME=VALUE, --config LABEL:NAME=VALUE Configuration override(s) for a task with specified label, e.g. -c task:foo=newfoo -c task:bar.baz= 3 . -C LABEL:PATH, --configfile LABEL:PATH Configuration override file(s), applies to a task with a given label. --order-pipeline Order tasks in pipeline based on their data dependencies, ordering is performed as last step before saving or executing pipeline. -s PATH, --save-pipeline PATH Location for storing resulting pipeline definition in YAML format. --pipeline-dot PATH Location for storing GraphViz DOT representation of a pipeline. --instrument instrument Add an instrument which will be used to load config overrides when defining a pipeline. This must be the fully qualified class name   Quantum graph building options: -g PATH, --qgraph PATH Location for a serialized quantum graph definition (pickle file). If this option is given then all input data options and pipeline-building options cannot be used. --skip-existing If all Quantum outputs already exist in the output RUN collection then that Quantum will be excluded from the QuantumGraph. Requires --extend-run. -q PATH, --save-qgraph PATH Location for storing a serialized quantum graph definition (pickle file). --save-single-quanta PATH Format string of locations for storing individual quantum graph definition (pickle files). The curly brace {} in the input string will be replaced by a quantum number. --qgraph-dot PATH Location for storing GraphViz DOT representation of a quantum graph.   Data repository and selection options: -b PATH, --butler-config PATH Location of the gen3 butler/registry config file. -i COLL,DSTYPE:COLL, --input COLL,DSTYPE:COLL Comma-separated names of the input collection(s). Any entry includes a colon (:), the first string is a dataset type name that restricts the search in that collection. May be passed multiple times (all arguments are concatenated). -o COLL, --output COLL Name of the output CHAINED collection. This may either be an existing CHAINED collection to use as both input and output (incompatible with --input), or a new CHAINED collection created to include all inputs (requires --input). In both cases, the collection's children will start with an output RUN collection that directly holds all new datasets (see --output-run). --output-run COLL Name of the new output RUN collection. If not provided, --output must be, a new RUN collection will be created by appending a timestamp to the value passed with --output. If this collection already exists, --extend-run must be passed. --extend-run Instead of creating a new RUN collection, insert datasets into either the one given by --output-run ( if provided) or the first child collection of --output (which must be of type RUN). --replace-run Before creating a new RUN collection in an existing CHAINED collection, remove the first child collection (which must be of type RUN). This can be used to repeatedly write to the same (parent) collection during development, but it does not delete the datasets associated with the replaced run unless --prune-replaced is also passed. Requires --output, and incompatible with --extend-run. --prune-replaced {unstore,purge} Delete the datasets in the collection replaced by --replace-run, either just from the datastore ( 'unstore' ) or by removing them and the RUN completely ( 'purge' ). Requires --replace-run. -d QUERY, --data-query QUERY User data selection expression. Notes: * many options can appear multiple times; all values are used, in order left to right * @file reads command-line options from the specified file: * data may be distributed among multiple lines (e.g. one option per line) * data after # is treated as a comment and ignored * blank lines and lines starting with # are ignored      
            Hide
            salnikov Andy Salnikov added a comment -

            Sorry for delay with reviewing, few comments for help formatting before I jump to github:

            • meta name for C should probably be PATH, not TEXT; same for -pipeline, -pipeline-dot
            • --order-pipeline is a flag, should not need TEXT
            • in general I like meta info in old command better, TEXT is too generic and not very helpful
            Show
            salnikov Andy Salnikov added a comment - Sorry for delay with reviewing, few comments for help formatting before I jump to github: meta name for C should probably be PATH, not TEXT; same for -pipeline , -pipeline-dot --order-pipeline is a flag, should not need TEXT in general I like meta info in old command better, TEXT is too generic and not very helpful
            Hide
            tjenness Tim Jenness added a comment -

            Looks like a good start. The grouping of commands would be nice but I take it that click can't support that (maybe you can make an upstream suggestion to click developers). I don't think grouping in the help is worth blocking migration to click.

            PS Remember to update the sphinx to include the command help.

            Show
            tjenness Tim Jenness added a comment - Looks like a good start. The grouping of commands would be nice but I take it that click can't support that (maybe you can make an upstream suggestion to click developers). I don't think grouping in the help is worth blocking migration to click. PS Remember to update the sphinx to include the command help.
            Hide
            salnikov Andy Salnikov added a comment -

            Please check my comments on PR, I think it looks OK but some details need to be fixed to keep it compatible with current pipetask.

            I tried to build the branch and scons failed on tests in all three packages, so it does not look quite ready for merge yet

            Show
            salnikov Andy Salnikov added a comment - Please check my comments on PR, I think it looks OK but some details need to be fixed to keep it compatible with current pipetask. I tried to build the branch and scons failed on tests in all three packages, so it does not look quite ready for merge yet

              People

              Assignee:
              npease Nate Pease [X] (Inactive)
              Reporter:
              npease Nate Pease [X] (Inactive)
              Reviewers:
              Andy Salnikov, Tim Jenness
              Watchers:
              Andy Salnikov, Nate Pease [X] (Inactive), Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.