Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-38498

Rewrite QG generation to make use of PipelineGraph

    XMLWordPrintable

    Details

      Description

      This ticket was originally about making it so NoWorkFound could be raised inside adjustQuantum, as is already documented to be legal:

      The docs for adjustQuantum say that NoWorkFound indicates that a quantum should be pruned: https://github.com/lsst/pipe_base/blob/699de5b7458c89c9c09f5da5995bcc557cd3e7d0/python/lsst/pipe/base/connections.py#L575-L578. However, this is only true at runtime, and not at quantum graph building time where raising this exception will kill the build. For DM-37383 we are finding that the performance overhead of having unpruned quanta is unacceptably slow (partly due to overheads for each quantum at USDF). Fixing this is the most obvious place to make that work faster.

      I also have volunteered that work on DM-37383 would be a guinea pig for battle testing this new functionality.

      But it turns out that problem more-or-less needed a near-complete rewrite of the QG generation code, and I had already been planning a complete rewrite after adding PipelineGraph on DM-33027.  That came with a few more goals:

      • Get rid of the NamedKeyMapping data structures that made it really hard to handle storage class conversions correctly, and finally handle component datasets correctly.
      • Split up independent pipeline subgraphs to divide-and-conquer.
      • Define an interface that could address some code duplication in the couple of custom QG-generation reimplementations we have for very special cases downstream.

      And then, after doing all that, I saw opportunities to easily add some major optimizations that I've long wanted to do (leading to a daf_butler branch), and Nate Lust added a few more optimization commits as well.

        Attachments

          Issue Links

            Activity

            Hide
            jbosch Jim Bosch added a comment -

            I started in on this today, and unfortunately found it not as easy as I'd hoped; the existing QG pruning logic expects the pruning to be based on missing input dataset types, not other quanta, and there are subtle reasons why it's hard to retrofit that (they mostly boil down to "QG generation is a mess, and needs some attention that goes beyond fixing things in a hurry").  The good news is that I have already done a lot of a much more solid and complete fix on another branch, but the bad news is that it's backed up behind DM-33027.  So it may be a few more weeks.

            Show
            jbosch Jim Bosch added a comment - I started in on this today, and unfortunately found it not as easy as I'd hoped; the existing QG pruning logic expects the pruning to be based on missing input dataset types, not other quanta, and there are subtle reasons why it's hard to retrofit that (they mostly boil down to "QG generation is a mess, and needs some attention that goes beyond fixing things in a hurry").  The good news is that I have already done a lot of a much more solid and complete fix on another branch, but the bad news is that it's backed up behind DM-33027 .  So it may be a few more weeks.
            Hide
            jbosch Jim Bosch added a comment -

            Note to self: there is one commit on the branch for this ticket that should be preserved when picking this back up, but it's only a tiny piece of the puzzle.

            Show
            jbosch Jim Bosch added a comment - Note to self: there is one commit on the branch for this ticket that should be preserved when picking this back up, but it's only a tiny piece of the puzzle.
            Hide
            jbosch Jim Bosch added a comment -

            Andy Salnikov, I'm afraid I've got another big review for you, but Nate Lust got the last one and Tim Jenness has gotten a ton of little ones, so you're up.

            I've updated the ticket description to reflect the scope that actually ended up here.  I should also add a bit about what's not here.  I haven't made downstream code in ctrl_mpexec (or anywhere else) use the new QuantumGraphBuilder directly yet, but since the old GraphBuilder interface delegates to the new code it is actually in use.  Removing use of GraphBuilder will be DM-40441.

            Jenkins is still running and I want to do some new benchmarks now that this is fully done (preliminary ones are very promising), but I think it's not worth holding up the review for those. If Jenkins does spot something I'll make sure the changes appear as separate commits.

            The big new additions are the QuantumGraphBuilder base class and its AllDimensionsQuantumGraphBuilder derived class, which provide the bulk of the new QG generation implementation. The quantum_graph_skeleton and prerequisite_helper modules provide helper classes for them. All of these are in pipe_base.

            I've also added some small but critical new code in daf_butler to expand the query system a bit; this lets us do bulk queries for most prerequisite inputs, which yields a huge speedup for some pipelines.

            The other two PRs are ctrl_mpexec and ci_middleware, and are just small changes to tests.

            Test coverage here is hard to gauge, because most of the coverage is in ci_middleware, while most of the changes are in pipe_base.  So I've manually combined coverage from all locations, and here's the result:

            $ diff-cover coverage.xml 
            -------------
            Diff Coverage
            Diff: origin/main...HEAD, staged and unstaged changes
            -------------
            python/lsst/pipe/base/all_dimensions_quantum_graph_builder.py (85.7%): Missing lines 152,192,230-231,263,266-268,303,309,312,318,320,326,468,477,499-502,511-518
            python/lsst/pipe/base/automatic_connection_constants.py (100%)
            python/lsst/pipe/base/graph/graph.py (100%)
            python/lsst/pipe/base/graphBuilder.py (100%)
            python/lsst/pipe/base/pipeline.py (100%)
            python/lsst/pipe/base/pipeline_graph/_dataset_types.py (100%)
            python/lsst/pipe/base/pipeline_graph/_pipeline_graph.py (100%)
            python/lsst/pipe/base/pipeline_graph/_tasks.py (100%)
            python/lsst/pipe/base/prerequisite_helpers.py (79.3%): Missing lines 147,263,278,284,286,334,349,423-424,426,445-448,450,538-539,542,553,556,565,568,577,595,601,631,636-641,643,649-652,655
            python/lsst/pipe/base/quantum_graph_builder.py (88.5%): Missing lines 178,180,183,185,194,364,499,503-508,514-515,517,519,524-525,528,532-533,540,549,556,624,687,692,850-851,895,933-934,956-957,959,1085-1090
            python/lsst/pipe/base/quantum_graph_skeleton.py (94.9%): Missing lines 164,286-288,320,350,393
            -------------
            Total:   943 lines
            Missing: 115 lines
            Coverage: 87%
            -------------
            

            The missing coverage falls into three categories:

            • exception-raises that are just short of assertions (e.g. checks for nonsensical pipelines) that I just don't think are worth working too hard to test;
            • the actual functionality that originally spurred the ticket (being able to raise NoWorkFound in adjustQuantum, and have that propagate to downstream nodes): I'm going to do a one-off test of this using the DM-37383 ticket branches tomorrow, and then after that merges I'll make sure it gets included in ci_middleware;
            • a lot of the code paths in PrerequisiteFinder are fallback code paths used by the QuantumGraphBuilder base class, for which its only concrete subclass (AllDimensionsQuantumGraphBuilder) provides optimized overrides. I'm not sure what to do about testing those.
            Show
            jbosch Jim Bosch added a comment - Andy Salnikov , I'm afraid I've got another big review for you, but Nate Lust got the last one and Tim Jenness has gotten a ton of little ones, so you're up. I've updated the ticket description to reflect the scope that actually ended up here.  I should also add a bit about what's not here.  I haven't made downstream code in ctrl_mpexec (or anywhere else) use the new QuantumGraphBuilder directly yet, but since the old GraphBuilder interface delegates to the new code it is actually in use.  Removing use of GraphBuilder will be DM-40441 . Jenkins is still running and I want to do some new benchmarks now that this is fully done (preliminary ones are very promising), but I think it's not worth holding up the review for those. If Jenkins does spot something I'll make sure the changes appear as separate commits. The big new additions are the QuantumGraphBuilder base class and its AllDimensionsQuantumGraphBuilder derived class, which provide the bulk of the new QG generation implementation. The quantum_graph_skeleton and prerequisite_helper modules provide helper classes for them. All of these are in pipe_base . I've also added some small but critical new code in daf_butler to expand the query system a bit; this lets us do bulk queries for most prerequisite inputs, which yields a huge speedup for some pipelines. The other two PRs are ctrl_mpexec and ci_middleware , and are just small changes to tests. Test coverage here is hard to gauge, because most of the coverage is in ci_middleware , while most of the changes are in pipe_base .  So I've manually combined coverage from all locations, and here's the result: $ diff-cover coverage.xml ------------- Diff Coverage Diff: origin/main...HEAD, staged and unstaged changes ------------- python/lsst/pipe/base/all_dimensions_quantum_graph_builder.py (85.7%): Missing lines 152,192,230-231,263,266-268,303,309,312,318,320,326,468,477,499-502,511-518 python/lsst/pipe/base/automatic_connection_constants.py (100%) python/lsst/pipe/base/graph/graph.py (100%) python/lsst/pipe/base/graphBuilder.py (100%) python/lsst/pipe/base/pipeline.py (100%) python/lsst/pipe/base/pipeline_graph/_dataset_types.py (100%) python/lsst/pipe/base/pipeline_graph/_pipeline_graph.py (100%) python/lsst/pipe/base/pipeline_graph/_tasks.py (100%) python/lsst/pipe/base/prerequisite_helpers.py (79.3%): Missing lines 147,263,278,284,286,334,349,423-424,426,445-448,450,538-539,542,553,556,565,568,577,595,601,631,636-641,643,649-652,655 python/lsst/pipe/base/quantum_graph_builder.py (88.5%): Missing lines 178,180,183,185,194,364,499,503-508,514-515,517,519,524-525,528,532-533,540,549,556,624,687,692,850-851,895,933-934,956-957,959,1085-1090 python/lsst/pipe/base/quantum_graph_skeleton.py (94.9%): Missing lines 164,286-288,320,350,393 ------------- Total: 943 lines Missing: 115 lines Coverage: 87% ------------- The missing coverage falls into three categories: exception-raises that are just short of assertions (e.g. checks for nonsensical pipelines) that I just don't think are worth working too hard to test; the actual functionality that originally spurred the ticket (being able to raise NoWorkFound in adjustQuantum, and have that propagate to downstream nodes): I'm going to do a one-off test of this using the DM-37383 ticket branches tomorrow, and then after that merges I'll make sure it gets included in ci_middleware; a lot of the code paths in PrerequisiteFinder are fallback code paths used by the QuantumGraphBuilder base class, for which its only concrete subclass ( AllDimensionsQuantumGraphBuilder ) provides optimized overrides. I'm not sure what to do about testing those.
            Hide
            jbosch Jim Bosch added a comment -

            Hate to pile more on, but I've created three new PRs in response to a problem discovered in a Jenkins run of ci_hsc.  All changes are trivial and some are just docs, and they're all about the same thing (described in exactly the same way in all three commit messages, so I won't repeat it here):

             

            Show
            jbosch Jim Bosch added a comment - Hate to pile more on, but I've created three new PRs in response to a problem discovered in a Jenkins run of ci_hsc.  All changes are trivial and some are just docs, and they're all about the same thing (described in exactly the same way in all three commit messages, so I won't repeat it here): https://github.com/lsst/drp_pipe/pull/80 https://github.com/lsst/ci_hsc_gen3/pull/104 https://github.com/lsst/rc2_subset/pull/26  
            Hide
            salnikov Andy Salnikov added a comment -

            Looks good to me, though I'd need to spend a lot more time to understand every detail of it.

            Show
            salnikov Andy Salnikov added a comment - Looks good to me, though I'd need to spend a lot more time to understand every detail of it.

              People

              Assignee:
              jbosch Jim Bosch
              Reporter:
              erykoff Eli Rykoff
              Reviewers:
              Andy Salnikov
              Watchers:
              Andy Salnikov, Christopher Waters, Eli Rykoff, James Chiang, Jim Bosch, Nate Lust
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.