Details
-
Type:
Story
-
Status: Done
-
Resolution: Done
-
Fix Version/s: None
-
Component/s: daf_butler
-
Labels:
-
Story Points:16
-
Epic Link:
-
Team:Data Release Production
-
Urgent?:No
Description
The need for a near-complete rewrite of the queries subpackage has been building for a long time; there are simply enough major changes (over)due that it makes more sense to start a new subpackage, copy/move what's worth keeping (the expression parsing and validation, at least) there, and work on the new system until it's ready to replace the old one.
Motivations include:
- Making query result objects that are serializable (via pydantic) and that do not have SQLAlchemy objects embedded in them, so they can be used in RemoteRegistry (with transformation to SQLAlchemy done only just before execution in SqlRegistry).
- Making use of the not-really-new-anymore pairwise spatial-overlap tables, for performance, to address
DM-31583, and to remove the need for post-query spatial filtering. - Adding support for user-uploaded data IDs (sets the stage for data ID files on the command-line and new QG-gen algorithms).
- Making spatial joins user-controllable, to finally really fix QG generation limitations in jointcal, FGCM, etc.
- Separating "dataset constraint subquery" logic from "dataset search query" logic, and ensuring the WHERE constraints appear in the optimal place in each (optimizes https://lsstc.slack.com/archives/C01FBUGM2CV/p1631220754411100?thread_ts=1631203913.392200&cid=C01FBUGM2CV)
- Finally making calibration lookups vectorizable.
- Making result objects compatible with the new container ABCs to be introduced on DM-30332.
- Adding frequently-requested support for "what dataset types are in these collections" queries.
This has been on my radar for a ~year, but it got preempted by the no-work-found problem and stalled by the introduction of RemoteRegistry, which torpedoed my original prototype. But I think I have a solid enough idea now to try again.
Attachments
Issue Links
- blocks
-
DM-30438 Add support for uploading data IDs to temporary tables and vectorize data ID expansion
- To Do
-
DM-31705 Make skypix dimensions usable in regular input, output, and quantum data IDs
- To Do
-
DM-34838 Finish implementing dimension "populated_by" hooks and use them in default config
- To Do
-
DM-34888 add a butler command to list a calibration collection
- To Do
-
DM-37409 Add calibration-collection temporal joins to QueryBuilder
- To Do
-
DM-27660 Materialize dimension spatial relationships and overhaul query system, step 2
- In Progress
- is parent task of
-
DM-36108 Move daf_butler's Ellipsis typing workaround to utils
- Done
-
DM-36111 Miscellaneous fixes and minor improvements to registry support classes
- Done
-
DM-36174 Pre-daf_relation query system refactoring
- Done
-
DM-36313 Overhaul registry dataset type and collection wildcards
- Done
- is triggered by
-
RFC-878 Minor butler query API changes
- Adopted
-
RFC-890 Add daf_relation as a daf_butler dependency
- Implemented
- is triggering
-
DM-37855 Sorting of dimension records no longer allows order by ID
- Done
- relates to
-
DM-37868 Remove undesirable defensiveness in Registry.findDatasets and fix query truncation bug
- Done
-
DM-37938 Additional fixes for query spatial contraints
- Done
-
DM-38943 Guard against invalid calls to count() in butler query CLI
- Done
-
DM-32403 Support ORDER BY and LIMIT in registry query methods
- Done
-
DM-37450 Respect dataset type storage class in registry query methods
- Done
-
DM-33621 Support reading dataIds from external file
- To Do
-
DM-34263 Discuss Prompt Processing needs from middleware
- Done
- mentioned in
-
Page Loading...
I'd be super-amazed myself
I was planning to work on schema migrations today, so there is little chance I can finish big review by tomorrow.