Details
-
Type:
Bug
-
Status: Done
-
Resolution: Done
-
Fix Version/s: None
-
Component/s: daf_butler
-
Labels:
-
Story Points:5
-
Team:Data Access and Database
-
Urgent?:No
Description
Russell Owen points out some inconsistencies in the use of queryDimensionRecords.
TLDR:
- Inconsistency between using kwargs for missing instruments vs where clause with missing instrument.
- Bind parameter not working for governor dimensions.
Using the pipelines_check repo since I have it lying around and it has one instrument (HSC):
from lsst.daf.butler import Butler |
|
b = Butler("DATA_REPO") |
registry = b.registry |
where = "" |
bind = {} |
for instrument in ("HSC", "LATISS"): |
|
record_iter = b.registry.queryDimensionRecords( |
"exposure", |
instrument=instrument, |
bind=bind, |
where=where, |
)
|
print(f"Got {record_iter.count()} results") |
This fails on the second loop with:
Got 1 results
|
Traceback (most recent call last):
|
File "x.py", line 9, in <module>
|
record_iter = b.registry.queryDimensionRecords(
|
File "/Users/timj/work/lsstsw3/build/daf_butler/python/lsst/daf/butler/registries/sql.py", line 1102, in queryDimensionRecords
|
dataIds = self.queryDataIds(
|
File "/Users/timj/work/lsstsw3/build/daf_butler/python/lsst/daf/butler/registries/sql.py", line 1026, in queryDataIds
|
standardizedDataId = self.expandDataId(dataId, **kwargs)
|
File "/Users/timj/work/lsstsw3/build/daf_butler/python/lsst/daf/butler/registries/sql.py", line 718, in expandDataId
|
raise LookupError(
|
LookupError: Could not fetch record for required dimension instrument via keys {'instrument': 'LATISS'}.
|
if though instead of using kwargs I specify the instrument in the where clause it all works fine:
from lsst.daf.butler import Butler |
|
b = Butler("DATA_REPO") |
registry = b.registry |
where = "" |
bind = {} |
for instrument in ("HSC", "LATISS"): |
|
record_iter = b.registry.queryDimensionRecords( |
"exposure", |
# instrument=instrument, |
bind=bind, |
where=f"instrument = '{instrument}'", |
)
|
print(f"Got {record_iter.count()} results") |
Result:
Got 1 results
|
Got 0 results
|
This inconsistency is not great.
Furthermore, if we then say that we will commit to the WHERE clause and use bind I can't get it to work at all:
from lsst.daf.butler import Butler |
|
b = Butler("DATA_REPO") |
registry = b.registry |
for instrument in ("HSC", "LATISS"): |
|
record_iter = b.registry.queryDimensionRecords( |
"exposure", |
# instrument=instrument, |
bind={"inst": instrument}, |
where="instrument = inst", |
)
|
print(f"Got {record_iter.count()} results") |
which gives:
Traceback (most recent call last):
|
File "x.py", line 15, in <module>
|
print(f"Got {record_iter.count()} results")
|
File "/Users/timj/work/lsstsw3/build/daf_butler/python/lsst/daf/butler/registry/queries/_results.py", line 1102, in count
|
return self._dataIds.count(exact=exact)
|
File "/Users/timj/work/lsstsw3/build/daf_butler/python/lsst/daf/butler/registry/queries/_results.py", line 450, in count
|
return self._query.count(self._db, exact=exact)
|
File "/Users/timj/work/lsstsw3/build/daf_butler/python/lsst/daf/butler/registry/queries/_results.py", line 201, in _query
|
self._cached_query = self._query_factory(self._order_by, self._limit)
|
File "/Users/timj/work/lsstsw3/build/daf_butler/python/lsst/daf/butler/registries/sql.py", line 1056, in query_factory
|
summary = queries.QuerySummary(
|
File "/Users/timj/work/lsstsw3/build/daf_butler/python/lsst/daf/butler/registry/queries/_structs.py", line 395, in __init__
|
self.where = expression.attach(
|
File "/Users/timj/work/lsstsw3/build/daf_butler/python/lsst/daf/butler/registry/queries/_structs.py", line 161, in attach
|
raise RuntimeError(msg) from None
|
RuntimeError: Error in query expression "instrument = inst": No value(s) for governor dimensions {instrument} in expression that references dependent dimensions. 'Governor' dimensions must always be specified completely in either the query expression (via simple 'name=<value>' terms, not 'IN' terms) or in a data ID passed to the query method.
|
implying that governor dimensions are checked before the bind parameters are handled? Other bind parameters seem to work fine.
I will be content if and only if the exception raised for an unknown governer value both predicable and different than the exception raised for invalid queries – invalid in the database sense. My service does not want to know or care about your special "governer" dimensions. It wants to treat registry as much like a database server as it possibly can. Catching a special "this is a valid query and if this was a database this call would be returning no records" exception is certainly tolerable.
It almost sounds like Registry is the wrong API to be using, but I don't know of any alternative.