Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-27140

Make concrete pipeline definitions usable and hide those that aren't

    XMLWordPrintable

    Details

    • Story Points:
      2
    • Epic Link:
    • Team:
      Data Release Production
    • Urgent?:
      No

      Description

      When more people start using Gen3, we want them to find pipeline definitions in a long-term stable location, to avoid a lot of bad cargo-culting from the beginning.  At present, the only truly usable pipeline definition (in the sense that it includes all necessary overloads) is in ci_hsc_gen3, while pipe_tasks includes many pipeline definitions that look usable but actually aren't.

      Here's one proposal for how to fix that, involving only naming conventions and not new code:

      • Move/split ci_hsc_gen3/pipelines/CiHsc.yaml to obs_subaru/pipelines/DataReleaseProduction.yaml
      • Create similar definitions in obs_lsst (for at least imSim) and obs_decam
      • Rename pipe_tasks/pipelines/DataReleaseProduction.yaml to give it a leading underscore, indicating that it is intended only for inheritance or composition in other pipelines, not direct execution.
      • Remove ProcessCcd.yaml, Forced.yaml, Multiband.yaml, and Coaddition.yaml from pipe_tasks (merging them into DataReleaseProduction.yaml), or rename them with an underscore prefixes as well.  Since I don't think we want to also make obs_* versions of these, I would prefer to just remove them.
      • Consider renaming any [_]DataReleaseProduction.yaml to [_]DRP.yaml; given how often this will be typed on the command line, I suspect this move would be appreciated.

      This of course raises (at least) two questions about some functionality changes:

      • Should we (also) have a non-convention way to mark a pipeline definition as "abstract", and not directly runnable, to further prevent devs from thinking they can run a pipe_tasks pipeline just by passing --instrument to the pipetask invocation?  I think we should at least use an underscore-prefix convention (that's very visible, and visibility is good here), but if we can easily do more than that, it could prevent some cases of silently incorrect processing (in the sense that important configs could be missed), and that's a sufficiently serious issue that we should guard carefully against it.
      • Should we have a way to label sub-pipelines within a Pipeline definition?  I think we clearly need one; the original intent of the sub-pipelines in pipe_tasks was good, even if it didn't work out.
      • Should we remove the instrument command-line argument from pipetask entirely?  If we are encouraging pipeline yaml files to contain per-instrument config overrides directly ever (and I think Nate Lust is on record saying we should prefer that to config-file overrides in obs_ packages, in the long-term) then it is never safe to assume that just passing --instrument to pipetask is sufficient.

      I am open to solving these problems via functionality changes instead of just moving pipelines around, if it's easy enough, but we need to do something before Nov. 1.

       

        Attachments

          Issue Links

            Activity

            Hide
            nlust Nate Lust added a comment -

            I don't think that we should tell people to never use the pipe_task pipelines. They can be useful when wanting to build your own, or to not bring in any overrides when trying to run something. I do think the _ convention does convey "use only if you know what you are doing"

            What do you have against sub pipelines defined that then get integrated into a final one? I think it will make it MUCH more readable that having long files. I really don't see the advantage to changing this to labels within a longer document. I do not know why we would not make obs_* versions of these too. It is also possible that different obs packages will chose to make them, and some may bundle them differently.

            I know I would prefer to drop the instrument label, but I know that some people who have poked at gen3 do like it (and specifying the task with -t) I suspect that is because pipelines need time to become familiar.

            I like the name DRP, but in general we were encouraging people to use class name conventions with pipelines to get some sort of uniformity in naming. We could make an exception for DRP because it is so commonly used, but I fear the wild west if there is no convention.

            Show
            nlust Nate Lust added a comment - I don't think that we should tell people to never use the pipe_task pipelines. They can be useful when wanting to build your own, or to not bring in any overrides when trying to run something. I do think the _ convention does convey "use only if you know what you are doing" What do you have against sub pipelines defined that then get integrated into a final one? I think it will make it MUCH more readable that having long files. I really don't see the advantage to changing this to labels within a longer document. I do not know why we would not make obs_* versions of these too. It is also possible that different obs packages will chose to make them, and some may bundle them differently. I know I would prefer to drop the instrument label, but I know that some people who have poked at gen3 do like it (and specifying the task with -t) I suspect that is because pipelines need time to become familiar. I like the name DRP, but in general we were encouraging people to use class name conventions with pipelines to get some sort of uniformity in naming. We could make an exception for DRP because it is so commonly used, but I fear the wild west if there is no convention.
            Hide
            czw Christopher Waters added a comment -

            I'm happy to switch to whatever the final standard for pipeline definitions is chosen.  I've written the cp_pipe pipelines under the assumption that they are essentially the _ versions described here: a base that should be inherited by a obs_ package specific version that defines the instrument and any additional config overrides.  Reading this discussion makes me think I should have split the existing pipeline definitions into sub-pipelines: one to handle the particular ISR processing of the input data, and a second for the combination of that data into a final product.  This would make the testing of algorithms/config parameters (which are universally in the combination step) easier to run separate from the ISR processing, allowing a single ISR run to feed into multiple combinations.

            My only other comment is that haven't used class naming conventions, but that's just another style issue that I'm happy to change.

            Show
            czw Christopher Waters added a comment - I'm happy to switch to whatever the final standard for pipeline definitions is chosen.  I've written the cp_pipe pipelines under the assumption that they are essentially the _ versions described here: a base that should be inherited by a obs_ package specific version that defines the instrument and any additional config overrides.  Reading this discussion makes me think I should have split the existing pipeline definitions into sub-pipelines: one to handle the particular ISR processing of the input data, and a second for the combination of that data into a final product.  This would make the testing of algorithms/config parameters (which are universally in the combination step) easier to run separate from the ISR processing, allowing a single ISR run to feed into multiple combinations. My only other comment is that haven't used class naming conventions, but that's just another style issue that I'm happy to change.
            Hide
            jbosch Jim Bosch added a comment -

            Nate Lust, could you also take care of these small changes on this ticket:

            • rename the makeWarpTask label to just makeWarp
            • rename the charImage label to characterizeImage (so it's not unnecessarily different from CharacterizeImageTask._DefaultName)

            ?

            Show
            jbosch Jim Bosch added a comment - Nate Lust , could you also take care of these small changes on this ticket: rename the makeWarpTask label to just makeWarp rename the charImage label to characterizeImage (so it's not unnecessarily different from CharacterizeImageTask._DefaultName ) ?
            Hide
            nlust Nate Lust added a comment -

            will do

            Show
            nlust Nate Lust added a comment - will do
            Hide
            nlust Nate Lust added a comment -

            I think this is ready to review. This has expanded a bit to touch a few more packages, but it is fairly minor in all but pipe_base.

            Show
            nlust Nate Lust added a comment - I think this is ready to review. This has expanded a bit to touch a few more packages, but it is fairly minor in all but pipe_base.
            Hide
            jbosch Jim Bosch added a comment - - edited

            Looks good!

            (Lots of little things, mostly style, mostly in pipe_base.)

            Show
            jbosch Jim Bosch added a comment - - edited Looks good! (Lots of little things, mostly style, mostly in pipe_base.)
            Hide
            krzys Krzysztof Findeisen added a comment - - edited

            Just found out about this ticket. Can you please update https://github.com/lsst/ap_pipe/blob/master/pipelines/ApPipe.yaml to use the new _SingleFrame.yaml?

            Show
            krzys Krzysztof Findeisen added a comment - - edited Just found out about this ticket. Can you please update https://github.com/lsst/ap_pipe/blob/master/pipelines/ApPipe.yaml to use the new _SingleFrame.yaml ?

              People

              Assignee:
              nlust Nate Lust
              Reporter:
              jbosch Jim Bosch
              Reviewers:
              Jim Bosch
              Watchers:
              Christopher Waters, Hsin-Fang Chiang, Jim Bosch, Krzysztof Findeisen, Nate Lust, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.