Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-33600

Inconsistencies in queryDimensionRecords

    XMLWordPrintable

Details

    • 5
    • Data Access and Database
    • No

    Description

      rowen points out some inconsistencies in the use of queryDimensionRecords.

      TLDR:

      • Inconsistency between using kwargs for missing instruments vs where clause with missing instrument.
      • Bind parameter not working for governor dimensions.

      Using the pipelines_check repo since I have it lying around and it has one instrument (HSC):

      from lsst.daf.butler import Butler
       
      b = Butler("DATA_REPO")
      registry = b.registry
      where = ""
      bind = {}
      for instrument in ("HSC", "LATISS"):
       
          record_iter = b.registry.queryDimensionRecords(
              "exposure",
              instrument=instrument,
              bind=bind,
              where=where,
          )
          print(f"Got {record_iter.count()} results")
      

      This fails on the second loop with:

      Got 1 results
      Traceback (most recent call last):
        File "x.py", line 9, in <module>
          record_iter = b.registry.queryDimensionRecords(
        File "/Users/timj/work/lsstsw3/build/daf_butler/python/lsst/daf/butler/registries/sql.py", line 1102, in queryDimensionRecords
          dataIds = self.queryDataIds(
        File "/Users/timj/work/lsstsw3/build/daf_butler/python/lsst/daf/butler/registries/sql.py", line 1026, in queryDataIds
          standardizedDataId = self.expandDataId(dataId, **kwargs)
        File "/Users/timj/work/lsstsw3/build/daf_butler/python/lsst/daf/butler/registries/sql.py", line 718, in expandDataId
          raise LookupError(
      LookupError: Could not fetch record for required dimension instrument via keys {'instrument': 'LATISS'}.
      

      if though instead of using kwargs I specify the instrument in the where clause it all works fine:

      from lsst.daf.butler import Butler
       
      b = Butler("DATA_REPO")
      registry = b.registry
      where = ""
      bind = {}
      for instrument in ("HSC", "LATISS"):
       
          record_iter = b.registry.queryDimensionRecords(
              "exposure",
              # instrument=instrument,
              bind=bind,
              where=f"instrument = '{instrument}'",
          )
          print(f"Got {record_iter.count()} results")
      

      Result:

      Got 1 results
      Got 0 results
      

      This inconsistency is not great.

      Furthermore, if we then say that we will commit to the WHERE clause and use bind I can't get it to work at all:

      from lsst.daf.butler import Butler
       
      b = Butler("DATA_REPO")
      registry = b.registry
      for instrument in ("HSC", "LATISS"):
       
          record_iter = b.registry.queryDimensionRecords(
              "exposure",
              # instrument=instrument,
              bind={"inst": instrument},
              where="instrument = inst",
          )
          print(f"Got {record_iter.count()} results")
      

      which gives:

      Traceback (most recent call last):
        File "x.py", line 15, in <module>
          print(f"Got {record_iter.count()} results")
        File "/Users/timj/work/lsstsw3/build/daf_butler/python/lsst/daf/butler/registry/queries/_results.py", line 1102, in count
          return self._dataIds.count(exact=exact)
        File "/Users/timj/work/lsstsw3/build/daf_butler/python/lsst/daf/butler/registry/queries/_results.py", line 450, in count
          return self._query.count(self._db, exact=exact)
        File "/Users/timj/work/lsstsw3/build/daf_butler/python/lsst/daf/butler/registry/queries/_results.py", line 201, in _query
          self._cached_query = self._query_factory(self._order_by, self._limit)
        File "/Users/timj/work/lsstsw3/build/daf_butler/python/lsst/daf/butler/registries/sql.py", line 1056, in query_factory
          summary = queries.QuerySummary(
        File "/Users/timj/work/lsstsw3/build/daf_butler/python/lsst/daf/butler/registry/queries/_structs.py", line 395, in __init__
          self.where = expression.attach(
        File "/Users/timj/work/lsstsw3/build/daf_butler/python/lsst/daf/butler/registry/queries/_structs.py", line 161, in attach
          raise RuntimeError(msg) from None
      RuntimeError: Error in query expression "instrument = inst": No value(s) for governor dimensions {instrument} in expression that references dependent dimensions. 'Governor' dimensions must always be specified completely in either the query expression (via simple 'name=<value>' terms, not 'IN' terms) or in a data ID passed to the query method.
      

      implying that governor dimensions are checked before the bind parameters are handled? Other bind parameters seem to work fine.

      Attachments

        Issue Links

          Activity

            People

              salnikov Andy Salnikov
              tjenness Tim Jenness
              Jim Bosch
              Andy Salnikov, Jim Bosch, Russell Owen, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Jenkins

                  No builds found.