Uploaded image for project: 'Request For Comments'
  1. Request For Comments
  2. RFC-949

Deprecate and replace low-level QuantumGraph generation and Pipeline inspection interfaces

    XMLWordPrintable

Details

    • RFC
    • Status: Adopted
    • Resolution: Unresolved
    • DM
    • None

    Description

      Two nearly-complete tickets - DM-33027 and DM-38498 - will overhaul our code for building QuantumGraphs, and I'd like to deprecate some of the interfaces they obsolete (or nearly obsolete, with the rest of the work to be done while implementing this RFC).

      This includes (all symbols in lsst.pipe.base):

      • The TaskDef class in its entirety: a sequence of TaskDef objects is currently how we represent a fully-configured, topologically ordered pipeline, and DM-33027's PipelineGraph will do a much better job of that.
      • The Pipeline.toExpandedPipeline and Pipeline._iter_ methods: these return those sequences/iterators of TaskDef. These are effectively being replaced by Pipeline.to_graph, which returns a PipelineGraph.
      • The PipelineDatasetTypes and TaskDatasetTypes classes, in their entirety: these are how we currently extract the dataset types used by a pipeline, but they've actually been pretty fundamentally broken since we introduced the concept for storage class overrides during execution (a problem we've been hacking around in the QG generation algorithm ever since), PipelineGraph will replace these as well.
      • The BaseConnection.makeDatasetType method: this is now only used by PipelineDatasetTypes and TaskDatasetTypes.
      • The pipeTools module: this contains free functions for sorting pipelines topologically, which is also now a PipelineGraph responsibility.
      • The GraphBuilder class and graphBuilder module: DM-38498 will introduce a new more extensible QuantumGraphBuilder class with slightly different interface (to support that extensibility), and make the old GraphBuilder delegate to it, but there's no reason to keep GraphBuilder around long after that.
      • The QuantumGraph constructor will be modified to take a PipelineGraph instance as a new required argument, and the quanta argument will become a mapping with str (task label) keys instead of TaskDef keys. The QuantumGraph.taskGraph property will be replaced with a QuantumGraph.pipeline_graph property. QuantumGraph currently has a simpler pipeline-graph-like object embedded within it, and it makes sense to use the full PipelineGraph there now once we have it. The findTasksWithInput, findTasksWithOutput, tasksWithDSType, findTaskDefByName, and findTaskDefByLabel methods will also be removed as redundant with functionality provided by PipelineGraph.
      • The QuantumGraph getQuantaForTask, getNumberOfQuantaForTask, and getNodesForTask methods will be modified to take a str task label instead of a TaskDef instance.
      • The QuantumGraph constructor will also be modified by removing the pruneRefs argument, and the QuantumGraph.pruneGraphFromRefs method will be removed as well. The new QuantumGraphBuilder algorithm will take care of pruning the graph according to adjustQuantum calls as it builds the graph (which is more efficient and much simpler than doing it later), and I believe future use cases for pruning already-built graphs will involve using failed quanta (not failed or nonexistent datasets) as the primary input, so we'll want to rework those interfaces anyway.  These deprecations may be deferred until their long-term replacements are more clear.

      And in lsst.ctrl.mpexec:

      • All QuantumExecutor, SingleQuantumExector, and TaskFactory methods that take TaskDef will be modified to take lsst.pipe.base.pipeline_graph.TaskNode instances instead, with both supported during deprecation, and the keyword argument name changed from taskDef to task_node.
      • SeparablePipelineExecutor.make_quantum_graph will need to deprecate and replace its builder argument. I think it's clear that it'd maintain the original intent for this to be able to accept an arbitrary subclass of QuantumGraphBuilder instead, we can't just pass that directly interface, because QuantumGraphBuilder instances are given all of their inputs at construction, so we can't just pass in a fully-constructed instance if we want to keep pipeline as a separate argument.  I think it also needs to drop its where argument, as the fact that there is a (single) where expression for the full graph is no longer an assumption of the base class, so it'd be more appropriate to roll this into the replacement for builder as well - and this is just one example of a subclass-dependent constructor argument.  So I think the replacement would either be a factory callback that binds any subclass-specific arguments inside it, or a QuantumGraphBuilder type + arbitrary kwargs to pass to its constructor in addition to the base-class arguments.
      • SimplePipelineExecutor.from_pipeline will be modified to accept a PipelineGraph as well as a Pipeline for its first argument, and support for Sequence[TaskDef] there will be deprecated.

      These interfaces are mostly used internally by other middleware code that will be straightforward to update, but there is some direct usage of some of these in tests and test utility code (which should also be easy to fix, but more visible to non-middleware developers). Outside lsst_distrib I would guess that only the Prompt Processing system is using any of these, and I don't anticipate any real difficulty updating code there, either.

      Attachments

        Issue Links

          Activity

            Parejkoj John Parejko added a comment -

            I don't have a strong opinion about most of this, except for TaskDef. Currently, TaskDef's init is the only place we reliably freeze Task configs; DM-971 made it so that we validate configs on Task init, but we don't yet freeze them there; that's DM-972. I don't know if the new PipelineGraph continues the "validate and freeze configs" behavior, but we want to get the rest of that in Task's init anyway.

            If PipelineGraph doesn't instantiate the Tasks themselves until they are to be run, then validating and freezing should happen much earlier (in PipelineGraph's init, probably), in addition to happening in the Task init. I've wondered if we can instantiate the Tasks in a pipeline earlier and preserve those instances, as a possible optimization for AP, but that's probably outside the scope of this RFC.

            I've added DM-972 as related.

            Parejkoj John Parejko added a comment - I don't have a strong opinion about most of this, except for TaskDef . Currently, TaskDef 's init is the only place we reliably freeze Task configs; DM-971 made it so that we validate configs on Task init, but we don't yet freeze them there; that's DM-972 . I don't know if the new PipelineGraph continues the "validate and freeze configs" behavior, but we want to get the rest of that in Task's init anyway. If PipelineGraph doesn't instantiate the Tasks themselves until they are to be run, then validating and freezing should happen much earlier (in PipelineGraph 's init, probably), in addition to happening in the Task init. I've wondered if we can instantiate the Tasks in a pipeline earlier and preserve those instances, as a possible optimization for AP, but that's probably outside the scope of this RFC. I've added DM-972 as related.
            jbosch Jim Bosch added a comment -

            PipelineGraph validates and freezes configs when a task is added to it, just ask TaskDef does.  Like TaskDef,  it also instantiates the task's Connections class (but not the task itself) at that time.

            Instantiating the tasks themselves requires init-output datasets to be passed from one task to another (at present, this is done by writing to and reading from a butler, but this isn't the only way to do it).   I don't think inside PipelineGraph would be the best place to do that (though it's not a crazy idea) but PipelineGraph would help with any code that does do that, because makes it easy to identify those datasets and the tasks that produce and consume them.  This is related to DM-38041, which should be unblocked by DM-33027.

            jbosch Jim Bosch added a comment - PipelineGraph validates and freezes configs when a task is added to it, just ask TaskDef does.  Like TaskDef ,  it also instantiates the task's Connections class (but not the task itself) at that time. Instantiating the tasks themselves requires init-output datasets to be passed from one task to another (at present, this is done by writing to and reading from a butler, but this isn't the only way to do it).   I don't think inside PipelineGraph would be the best place to do that (though it's not a crazy idea) but  PipelineGraph would help with any code that does do that, because makes it easy to identify those datasets and the tasks that produce and consume them.  This is related to DM-38041 , which should be unblocked by DM-33027 .

            krzys, I'd appreciate your input on how to handle SeparablePipelineExecutor.make_quantum_graph; this is effectively the follow-up to a long-ago PR conversation (I'd claimed the boundary between GraphBuilder construction arguments and makeGraph arguments was arbitrary, and now I've finally thought about it what it ought to be).

            • Only the constructor takes arguments now; the build method does not, because the builder objects are single-use.
            • Subclasses may add their own constructor arguments to control implementation-specific behavior.

            That sounds reasonable, but it looks like the where/userQuery parameter has been dropped entirely. How do you constrain the new system to produce a graph for only certain data IDs?

            krzys Krzysztof Findeisen added a comment - krzys , I'd appreciate your input on how to handle SeparablePipelineExecutor.make_quantum_graph ; this is effectively the follow-up to a long-ago PR conversation (I'd claimed the boundary between GraphBuilder construction arguments and makeGraph arguments was arbitrary, and now I've finally thought about it what it ought to be). Only the constructor takes arguments now; the build method does not, because the builder objects are single-use. Subclasses may add their own constructor arguments to control implementation-specific behavior. That sounds reasonable, but it looks like the where / userQuery parameter has been dropped entirely. How do you constrain the new system to produce a graph for only certain data IDs?
            jbosch Jim Bosch added a comment -

            where is now a keyword argument to the constructor of the concrete derived class that will be our new general-purpose QG builder, because most of the other builder implementations I have in mind would either have no where argument or several.

            jbosch Jim Bosch added a comment - where is now a keyword argument to the constructor of the  concrete derived class that will be our new general-purpose QG builder, because most of the other builder implementations I have in mind would either have no where argument or several.
            jbosch Jim Bosch added a comment -

            I've adopted this with four triggering tickets:

            • DM-38498 may deprecate some things that happen to have no remaining callers after its own primary is complete; I think this will probably just be BaseConnection.makeDatasetType, PipelineDatasetTypes, and TaskDatasetTypes.
            • DM-40441 is next, and will deprecate GraphBuilder and various things that PipelineGraph now does better, without trying to change TaskDef and QuantumGraph.
            • DM-40442 will finish adding deprecations, focusing on QuantumGraph's interfaces that use TaskDef.  As this will involve changes to QuantumGraph internals as well, it's possible t will be preceded by another ticket that does that internal work and adds some of the replacement interfaces.
            • DM-40443 will remove deprecated interfaces, after the deprecation timer runs its course.
            jbosch Jim Bosch added a comment - I've adopted this with four triggering tickets: DM-38498 may deprecate some things that happen to have no remaining callers after its own primary is complete; I think this will probably just be BaseConnection.makeDatasetType , PipelineDatasetTypes , and TaskDatasetTypes . DM-40441 is next, and will deprecate GraphBuilder and various things that PipelineGraph now does better, without trying to change TaskDef and QuantumGraph . DM-40442 will finish adding deprecations, focusing on QuantumGraph 's interfaces that use TaskDef .  As this will involve changes to QuantumGraph internals as well, it's possible t will be preceded by another ticket that does that internal work and adds some of the replacement interfaces. DM-40443 will remove deprecated interfaces, after the deprecation timer runs its course.

            People

              jbosch Jim Bosch
              jbosch Jim Bosch
              Eli Rykoff, Jim Bosch, John Parejko, Krzysztof Findeisen, Lee Kelvin, Michelle Gower, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Planned End:

                Jenkins

                  No builds found.