Details
-
Type:
Story
-
Status: Done
-
Resolution: Done
-
Fix Version/s: None
-
Component/s: ip_isr
-
Labels:
-
Story Points:1
-
Epic Link:
-
Team:Data Release Production
-
Urgent?:No
Description
The custom lookupFunction for crosstalkSources at
turns out to be painfully inefficient. For each exposure-detector combination, we are:
- Looking up all (per-detector) crosstalkSources datasets for that exposure, in a findFirst=True queryDatasets call. That means we perform the exact same query n_detectors times for each exposure, and because of findFirst=True it's a gnarly one with a CTE and complex subqueries. Unfortunately the lookupFunction runs once per quantum, and there's no way to share information across quanta, so at present we can't really change this.
- We then call .expanded(), which uses the previous dataset query as a subquery constraint when querying for the records for each relevant dimension (instrument, exposure, detector, physical_filter, band). When the constraining subquery is simple and fast, but the number of records returned is potentially large (e.g. when we actually materialize the constraining subquery into a temporary table), that's an efficient way to do data ID expansion, relative to the alternative, Registry.expandDataId, which that looks up one record value at a time given explicit primary key values known to Python. But here we already have the exposure record in hand, and it turns out all of the other dimensions are configured to be cached by Registry, so amortized expandDataId in a loop should involve no queries.
I can make the change, but I'm hoping for help from some combination of Eli Rykoff and Lee Kelvin to test it.
Interesting. Definitely an improvement, but not as much as I would have hoped given my testing. However, I now realize I was testing just ISR while you're doing the full process ccd. Some quick benchmarking shows that the tall pole now appears to be the spatial matching to the refcats. That is for another investigation and ticket. (I'm hoping, though, that this can get the flat generation rolling because that shouldn't have to worry about spatial matching).