I've finished a run with profiling enabled (and the bug that killed the last one avoided by disabling ForcedPhotCcdTask, the last step in the pipeline, and one often skipped for other reasons).
First off, the profiling overheads appear to significant, unless something else (load, caching) changed under me: the last run took just under 4 hours total, while this one took about 6 hours.
A breakdown from status logging:
- 23m: executing and iterating over rows from the big initial query
- 10m: resolving (finding dataset_ids and runs) for input datasets (almost entirely raws)
- 1h20m: assembling "isr" quanta (includes calibration dataset lookups)
- 1h10m: assembling "charImage" quanta
- 2h: assembling "calibrate" quanta (includes refcat lookups)
- 30m: assembling "makeWarp" quanta
- 5m: assembling quanta for all other tasks combined
And the big news is that the profile (to the extent it can be trusted, given the overheads) says almost all of the time is going into Registry.relateDataIds, which we use to test each Quantum + DatasetRef pair for compatibility while assembling the Quantum. I think the vastly different amounts of time spent for different tasks just corresponds to the number of such combinations for those tasks (given their DatasetTypes and the number of data IDs for them), and now that I think about it, there's no way that testing scales linearly with the size of the graph - it's got to be more like O(N^2), because the number of tests scales as the product of the number of Quanta and the number of DatasetRefs.
The good news is that I think I see a way to restructure things to avoid calling Registry.relateDataIds (and this O(N^2) behavior) entirely, because the rows of the big initial query tell us everything we need about Quantum + DatasetRef relationships. But that will take some restructuring because we've thrown those rows away by the time we assemble the quanta themselves.
An aside for Nate Lust: I'm a bit worried the design you're implementing on may run into the same scaling problem, or something very much like it. It would be worthwhile to do the thought experiment of walking through all of the steps looking explicitly any whose number scales as the product of the number of two kinds of data IDs.