Details
-
Type:
Story
-
Status: Done
-
Resolution: Done
-
Fix Version/s: None
-
Component/s: None
-
Story Points:24
-
Epic Link:
-
Sprint:DB_F20_06
-
Team:Data Access and Database
-
Urgent?:No
Description
begin conversion of the existing pipetask command in ctrl_mpexec to the click framework, based on or similar to the butler command.
Implement the build subcommand. My goal is to rewrite/refactor as little of the existing implementation code as possible. There is room for improvements (e.g. don't propagate an 'args' container), but it would be better to fix this after the argparse framework is removed so we don't have to work with parallel implementations.
Attachments
Issue Links
- is duplicated by
-
DM-24508 Add lsst log message init to Click CLI tools
- Done
Activity
So this ticket is an initial conversion with the build command only? Can you make the ticket title more explicit about that please?
Yes. Changed.
New help text is
$ pipetask2 build --help
|
Usage: pipetask2 build [OPTIONS]
|
|
|
Build and optionally save pipeline definition.
|
|
|
This does not require input data to be specified.
|
|
|
Options:
|
-t, --task TEXT Task name to add to pipeline, must be a fully
|
qualified task name. Task name can be followed by
|
colon and label name, if label is not given than |
task base name (class name) is used as label. |
--show TEXT Dump various info to standard output. Possible
|
items are: `config', `config=[Task::]<PATTERN>' or |
`config=[Task::]<PATTERN>:NOIGNORECASE' to dump
|
configuration fields possibly matching given
|
pattern and/or task label; `history=<FIELD>' to
|
dump configuration history for a field, field name |
is specified as [Task::][SubTask.]Field; `dump-
|
config', `dump-config=Task' to dump complete |
configuration for a task given its label or all |
tasks; `pipeline' to show pipeline composition;
|
`graph' to show information about quanta;
|
`workflow' to show information about quanta and
|
their dependency; `tasks' to show task
|
composition.
|
-s, --save-pipeline TEXT Location for storing resulting pipeline definition |
in YAML format.
|
-p, --pipeline TEXT Location of a pipeline definition file in YAML
|
format.
|
--pipeline-dot PATH Location for storing GraphViz DOT representation |
of a pipeline.
|
--order-pipeline TEXT Order tasks in pipeline based on their data
|
dependencies, ordering is performed as last step
|
before saving or executing pipeline.
|
-i, --instrument TEXT Add an instrument which will be used to load
|
config overrides when defining a pipeline. This
|
must be the fully qualified class name. |
--delete TEXT Delete task with given label from pipeline.
|
-c, --config TEXT Config override, as a key-value pair.
|
-C, --config-file TEXT Path to a pex config override to be included after
|
the Instrument config overrides are applied.
|
-h, --help Show this message and exit. |
The existing pipetask command provided some guidance in the option type meta info e.g. --show ITEM|ITEM=VALUE instead of –show TEXT. Do we want to preserve these annotations?
Also, the pipetask help is divided into sections, I'll include a complete example for build below. If we can't separate options into sections are we going to be ok with that?
$ pipetask qgraph --help
|
usage: pipetask qgraph [-h] [-L LEVEL|COMPONENT=LEVEL] [--longlog] [--debug]
|
[-p PATH] [-t TASK[:LABEL]] [--delete LABEL]
|
[-c LABEL:NAME=VALUE] [-C LABEL:PATH]
|
[--order-pipeline] [-s PATH] [--pipeline-dot PATH]
|
[--instrument instrument] [-g PATH] [--skip-existing]
|
[-q PATH] [--save-single-quanta PATH]
|
[--qgraph-dot PATH] [-b PATH] [-i COLL,DSTYPE:COLL]
|
[-o COLL] [--output-run COLL]
|
[--extend-run | --replace-run]
|
[--prune-replaced {unstore,purge}] [-d QUERY]
|
[--show ITEM|ITEM=VALUE]
|
|
Build and optionally save pipeline and quantum graph.
|
|
optional arguments:
|
-h, --help show this help message and exit |
--show ITEM|ITEM=VALUE
|
Dump various info to standard output. Possible items
|
are: `config', `config=[Task::]<PATTERN>' or |
`config=[Task::]<PATTERN>:NOIGNORECASE' to dump
|
configuration fields possibly matching given pattern
|
and/or task label; `history=<FIELD>' to dump
|
configuration history for a field, field name is |
specified as [Task::][SubTask.]Field; `dump-config',
|
`dump-config=Task' to dump complete configuration for |
a task given its label or all tasks; `pipeline' to
|
show pipeline composition; `graph' to show information
|
about quanta; `workflow' to show information about
|
quanta and their dependency; `tasks' to show task
|
composition.
|
|
Logging options:
|
-L LEVEL|COMPONENT=LEVEL, --loglevel LEVEL|COMPONENT=LEVEL
|
logging level; supported levels are
|
[trace|debug|info|warn|error|fatal]
|
--longlog use a more verbose format for the logging |
--debug enable debugging output using lsstDebug facility
|
(imports debug.py)
|
|
Pipeline building options:
|
-p PATH, --pipeline PATH
|
Location of a pipeline definition file in YAML format.
|
-t TASK[:LABEL], --task TASK[:LABEL]
|
Task name to add to pipeline, must be a fully
|
qualified task name. Task name can be followed by
|
colon and label name, if label is not given than task |
base name (class name) is used as label. |
--delete LABEL Delete task with given label from pipeline.
|
-c LABEL:NAME=VALUE, --config LABEL:NAME=VALUE
|
Configuration override(s) for a task with specified |
label, e.g. -c task:foo=newfoo -c task:bar.baz=3. |
-C LABEL:PATH, --configfile LABEL:PATH
|
Configuration override file(s), applies to a task with
|
a given label.
|
--order-pipeline Order tasks in pipeline based on their data
|
dependencies, ordering is performed as last step
|
before saving or executing pipeline.
|
-s PATH, --save-pipeline PATH
|
Location for storing resulting pipeline definition in |
YAML format.
|
--pipeline-dot PATH Location for storing GraphViz DOT representation of a |
pipeline.
|
--instrument instrument
|
Add an instrument which will be used to load config
|
overrides when defining a pipeline. This must be the
|
fully qualified class name |
|
Quantum graph building options:
|
-g PATH, --qgraph PATH
|
Location for a serialized quantum graph definition |
(pickle file). If this option is given then all input |
data options and pipeline-building options cannot be
|
used.
|
--skip-existing If all Quantum outputs already exist in the output RUN
|
collection then that Quantum will be excluded from the
|
QuantumGraph. Requires --extend-run.
|
-q PATH, --save-qgraph PATH
|
Location for storing a serialized quantum graph |
definition (pickle file).
|
--save-single-quanta PATH
|
Format string of locations for storing individual |
quantum graph definition (pickle files). The curly
|
brace {} in the input string will be replaced by a
|
quantum number.
|
--qgraph-dot PATH Location for storing GraphViz DOT representation of a |
quantum graph.
|
|
Data repository and selection options:
|
-b PATH, --butler-config PATH
|
Location of the gen3 butler/registry config file.
|
-i COLL,DSTYPE:COLL, --input COLL,DSTYPE:COLL
|
Comma-separated names of the input collection(s). Any
|
entry includes a colon (:), the first string is a
|
dataset type name that restricts the search in that
|
collection. May be passed multiple times (all
|
arguments are concatenated).
|
-o COLL, --output COLL
|
Name of the output CHAINED collection. This may either
|
be an existing CHAINED collection to use as both input
|
and output (incompatible with --input), or a new |
CHAINED collection created to include all inputs
|
(requires --input). In both cases, the collection's
|
children will start with an output RUN collection that
|
directly holds all new datasets (see --output-run). |
--output-run COLL Name of the new output RUN collection. If not |
provided, --output must be, a new RUN collection will |
be created by appending a timestamp to the value
|
passed with --output. If this collection already |
exists, --extend-run must be passed.
|
--extend-run Instead of creating a new RUN collection, insert |
datasets into either the one given by --output-run (if |
provided) or the first child collection of --output
|
(which must be of type RUN).
|
--replace-run Before creating a new RUN collection in an existing |
CHAINED collection, remove the first child collection
|
(which must be of type RUN). This can be used to
|
repeatedly write to the same (parent) collection
|
during development, but it does not delete the
|
datasets associated with the replaced run unless
|
--prune-replaced is also passed. Requires --output,
|
and incompatible with --extend-run.
|
--prune-replaced {unstore,purge}
|
Delete the datasets in the collection replaced by
|
--replace-run, either just from the datastore
|
('unstore') or by removing them and the RUN completely |
('purge'). Requires --replace-run. |
-d QUERY, --data-query QUERY
|
User data selection expression.
|
Notes:
|
* many options can appear multiple times; all values are used, in order
|
left to right
|
* @file reads command-line options from the specified file: |
* data may be distributed among multiple lines (e.g. one option per line)
|
* data after # is treated as a comment and ignored
|
* blank lines and lines starting with # are ignored
|
|
|
Sorry for delay with reviewing, few comments for help formatting before I jump to github:
- meta name for
Cshould probably be PATH, not TEXT; same for -pipeline, -pipeline-dot - --order-pipeline is a flag, should not need TEXT
- in general I like meta info in old command better, TEXT is too generic and not very helpful
Looks like a good start. The grouping of commands would be nice but I take it that click can't support that (maybe you can make an upstream suggestion to click developers). I don't think grouping in the help is worth blocking migration to click.
PS Remember to update the sphinx to include the command help.
Please check my comments on PR, I think it looks OK but some details need to be fixed to keep it compatible with current pipetask.
I tried to build the branch and scons failed on tests in all three packages, so it does not look quite ready for merge yet
I added a command to ctrl_mpexec called pipetask2, once we're ready to switch we can delete the existing pipetask command and related infrastructure, and rename the command, removing the '2'.
the build subcommand is implemented, perhaps Andy Salnikov should have a look?
The commit called "convert unit tests to mixins" is a pretty substantial change to a mixin-based test strategy. Earlier commits in that project make it so the butler command loader can be reused with pipetask. Everything else is minor cleanups, and implementation of shared options for pipetask2.