Details
-
Type:
Story
-
Status: Done
-
Resolution: Done
-
Fix Version/s: None
-
Component/s: ctrl_mpexec, daf_butler, pipe_base
-
Labels:
-
Story Points:6
-
Epic Link:
-
Sprint:DRP S19-3
-
Team:Data Release Production
Description
Nate Lust reports that trying to build a single QuantumGraph for all of the PipelineTasks in ci_hsc runs into SQLite's hard-coded maximum of 64 joins a single query. I can think of several ways to avoid this, many involving temporary tables (I know those work differently in Oracle; I think my ideas are compatible with that).
For the demo we'll plan on just running the pipelines in three stages, for which QuantumGraph generation works fine. If everything else is going extremely well early in the week I may take a stab at resolving this, too.
Attachments
Issue Links
- relates to
-
DM-17611 Performance optimizations to data ID code
- Done
Simplest approach seems to have done the trick: I've removed all output datasets from the big selectDimensions query, since they didn't constrain it anyway, and we were already not relying on that query returning dataset_ids for them.
After that, some profiling revealed that the slow overall speed of preflight (12 minutes for ci_hsc!) was not being spent in that query - it was in slow, mostly Python data ID manipulations afterwards. All but the first commit on this branch are optimizations for that. It's down to 5 minutes now, and while I have ideas on how to go further, I think they're out of scope for this ticket.
Andy Salnikov, could you look at the first commit?
Nate Lust, could you look at the rest?
PR is: https://github.com/lsst/daf_butler/pull/124