Details
-
Type:
Story
-
Status: Done
-
Resolution: Done
-
Fix Version/s: None
-
Component/s: daf_butler
-
Labels:
-
Story Points:2
-
Epic Link:
-
Team:Data Release Production
-
Urgent?:No
Description
Users have recently noticed that they can no longer perform queries where the WHERE constraint references dimensions that are not included in the dimensions of the resultant data IDs.
This is a new and accidental limitation, and I think it came from two seemingly-unrelated changes:
- On
DM-26590, I accidentally fixed a bug (in that there was a bug, but I didn't realize it, and refactoring made it go away) in which the "target" dimensions the users passes for the data IDs a query should return were not being respected (they were instead being expanded to include dimensions referenced in the WHERE expression). This went unnoticed for so long because usually the WHERE expression dimensions were a subset of the target dimensions, and the code in QG gen didn't care if the target dimensions were expanded as long as the dimensions it requested were present (except, perhaps, by being a bit less efficient).
- Earlier, on
DM-25919, I added a check that raised an exception where I knew the query system couldn't always do the right thing (yet); this was targeted at the QG gen case where we query for a bunch of data IDs, dump those in a temporary table, and then join against that temp table when querying for input datasets. I knew that we couldn't reliably query for datasets whose dimensions were not a subset of those the temp table, because that code doesn't try to add any spatial or temporal joins (this is hard, because the temp table might already embed some of those). But because (at the time) that other unnoticed bug was still in play, I didn't notice that this check was too aggressive, and also gave up unnecessarily in other queries when there was no temporary data ID table involved.
The right fix for this is to have the query system carry around both the target dimensions and the full, expanded-to-include-WHERE dimensions separately, and perform this check against the latter. I had always planned to do this anyway as part of a bigger set of improvements (most recently represented by DM-27660), but with this regression it's worth moving that particular change up in priority.
This was even simpler to fix and harder to trigger than I originally expected: the query system already tracked both the complete set of dimensions that need to be joined into the query (QuerySummary.mustHaveKeysJoined) separately from the desired data ID dimensions (QuerySummary.requested), and the problem was just that this check used the wrong one. And it could only be triggered by queries that involve spatial joins (or data ID temp tables), because that's the only code path that would run into this check at present.
Tim Jenness, mind reviewing this one-line-fix (with a 10 line test and some mypy stuff thrown in)?
https://github.com/lsst/daf_butler/pull/468