Details
-
Type:
Story
-
Status: Done
-
Resolution: Done
-
Fix Version/s: None
-
Component/s: ctrl_mpexec, pipe_base
-
Labels:
-
Story Points:16
-
Epic Link:
-
Team:Data Release Production
-
Urgent?:No
Description
We've come across a new QG generation problem that's catastrophically slow, as detailed on this thread. Unfortunately neither DM-31548 nor DM-31583 resolve it, though they both help a bit, and I believe it's an important one for DRP and maybe DP0.2.
The slowdown is in the big initial query, which in this case involves both a very constraining tract expression and a very large number of dataset subqueries (exacerbated by a large number of collections being searched); we need the query planner to start with that tract constraint, and only check for dataset existence within each tract. I can manually rewrite the query to essentially force that behavior (or something like it; the 40min query I get back is still much slower than I want, but it's tolerable for now, and I'm bad at reading query plans), but doing that kind of rewrite all the time would make other QG generation queries catastrophically slow (because sometimes it's a dataset/collection subquery that has the most constraining power).
So it seems we really need levers for users to control which dataset types (if any) enter in that big initial query, and because we already perform follow-up queries for all input dataset types, there's no need for query-rewriting; we "just" need to stop assuming that those follow-up queries always return result for all data IDs, probably by pruning the QG in Python after constructing it. After consultation with Nate Lust , we'd like to give that a try, especially because:
- it means this ticket shouldn't get blocked on butler query improvements that I might promise but never deliver;
- it should get us some QG pruning support that could prove useful in other ways, such as in the long-promised DM-21904, which our conversation reaffirmed as the long-term goal (assuming I can actually deliver those butler query improvements).
Yusra AlSayyad and Tim Jenness , I hope you don't mind Nate Lust working on this next, as I think it's a pretty high priority. I think it's probably a 2-3 week project (maybe much faster if the pruning is easy, but neither of us remembered the code structure well enough to be confident it would be).
Attachments
Issue Links
- blocks
-
DM-32245 Reprocess HiTS AP with fakes and an APDB
- Done
- relates to
-
DM-32376 ap_verify gen3 fails to find jointcal_photoCalib dataset in graph generation
- Done
-
DM-30703 Reprocess DECam HiTS data from scratch with background fixes
- Done
-
DM-32058 Duplicate faro task in pipeline gives cryptic error
- Done
We've been iterating on the PR for a while now, but while there are a couple of small things I'd like to see improved before merge, I don't think I need to take another look to check that they are, unless you'd like me to.