Fix Version/s: None
Sprint:AP F22-1 (June)
Collate the timing output of a run on the large (4 patches with >200 visits) DC2 dataset, and compute some statistics on it (mean, median, stddev, min/max, top/bottom 5%).
In addition, we should look at how long `timing_diaPipe_associator` takes, as a function of "number of the visit in the processing sequence" (which may not be the visit number, if we cannot ensure an ordering). We can use the DiaObject records in APDB to determine the number of DiaSources to compute (mean, median, or max?) nDiaSources in that visit as a proxy for processing order (i.e. plot association time vs. number of objects).
Hi Meredith Rawls, looks good. A couple of small-ish suggestions on the notebook:
- I think it would be helpful to see the DIApipe vs. MJD processing time colored by detector-visit
- I would also make the same plot for all of the existing metricvalue_ap_association_* timing metrics so we can (perhaps) isolate more clearly where the scaling is coming from--I'm hopeful it is just in the catalog loading step?
- Can we estimate how many historical DIASources there are at each timestep?
- Go ahead and add a few explanatory text cells throughout
- Delete extraneous debugging cells that print long dictionaries
I made the requested notebook edits, and merged to main already (minor whoops). If you want additional edits, LMK, otherwise this should be all done. The sprint confluence page is also updated.
It is straightforward to collate timing metrics for all tasks, except for association in the level of detail desired. I'd like to pair code with Eric Bellm to figure out how to pull this info out of the DiaObject table.