I've just finished the first profiles for QuantumGraph generation with this branch. Let's start by recapping where we were when we started - basically the status at the end of DM-24734, but I've re-run those profiles against the w_2020_25 and am reporting those more recent results to make sure it's a fair comparison for this branch.
Test is building a single-tract quantum graph for the full DRP Gen3 pipeline, with the database loaded with all three RC2 tracts, in a SQLite database on /project.
On w_2020_25 (should be the same as master), total time was 57m54s:
- 26m37s: executing the big initial query, expanding its data IDs, and associating quanta and dataset data IDs. These steps are not easily separated in the logs on master, because they're interleaved. The profile lets us break them down, but the times are measured slightly differently so this is approximate:
- 2m0s: executing the big initial query
- 16m40s: expanding data IDs
- 7m95s: associating quantum and dataset data IDs (probably slightly overestimated relative to the others)
- 1m6s: follow-up queries for regular inputs (raw and deepCoadd_skyMap)
- 27m28s: follow-up queries for ISR prerequisites (calibrations)
- 1m9s: follow-up queries for calibrate prerequisites (refcats)
On this branch, total time was 54m51s:
- 8m13s: executing the big initial query, expanding its data IDs, and associating quanta and dataset data IDs. These steps are now separable in the logs, yielding:
- 1m25s: executing the big initial query
- 0m2s: expanding data IDs
- 6m46s: associating quantum and dataset data IDs
- 0m4s: follow-up queries for regular inputs (raw and deepCoadd_skyMap)
- 42m57s: follow-up queries for ISR prerequisites (calibrations)
- 1m21s: follow-up queries for calibrate prerequisites (refcats)
At first glance, it looks like things barely improved, because some steps got faster while the single slowest step got slower, but if you connect this to what the changes were supposed to do, it's really promising. My goals on this ticket were:
- Speed up data ID expansion: absolutely crushed this, going from ~17m to 2s.
- Speed up "normal" (non-prerequisite) dataset lookups: crushed this, going from 1m down to 4s.
- Set the stage for
DM-24432, where I'd work on speeding up calibration lookups. That's what's gotten slower here, and it's because I didn't try to replace this loop over (of order) 20-50k complex, single-row queries with something vectorized; instead I made it a bit slower to construct each of those queries because the normal dataset lookup path (which I did vectorize) uses some of the same code.
So, I think this ticket has accomplished its goals (assuming these results hold up in PostgreSQL), and it's time to clean it up and move on to DM-24432. I'm pretty confident I can get that ~40m block down to <5m at the worst, and possibly <1m.
The part of these profiles that's going to be hardest to improve is probably the ~7m we take associating quantum and dataset data IDs, as that's pure Python with no database stuff involved (note that there's been no real attempt to optimize the database indices or structure to improve these queries, aside from the big-picture changes we've made since the last round of RC2+Gen3 work over a year ago). Dropping spurious overlaps (DM-24024, for HSC, at least) should help by reducing the amount of work to actually do there, but I don't really know how much. There might be a way to use the database more in that step, especially now that the thing we iterate over there is already materialized in a temporary table, and it's probable that DM-21904 would move us in that direction anyway. Something to ponder once DM-24432 is done and we've got QG generation down from almost an hour to (hopefully) 10-12 minutes.
Current design sketch: https://gist.github.com/TallJimbo/1a2262743e5bfc34a8c1dd6100ea98d9