Details
-
Type:
Story
-
Status: Won't Fix
-
Resolution: Done
-
Fix Version/s: None
-
Component/s: daf_butler
-
Story Points:4
-
Epic Link:
-
Sprint:DB_S20_02
-
Team:Data Access and Database
-
Urgent?:No
Description
Our current approach to spatial indexing on the sphere is simply to store the HTM IDs each region overlaps at a particular level, in an extra "common skypix overlap" table for every table that has a region column. This might be an okay approach for regions of a certain size, but it's definitely not a good approach for regions of arbitrary size, and probably not close to the best approach even for its ideal region size.
Doing better would probably require using special index types built for spatial overlap queries (R trees and variants thereof). While the databases of interest to us (SQLite, Oracle, and PostgreSQL) do not support indexing regions on the sphere natively, they all do provide ways to index 3-d bounding boxes, and the lsst.sphgeom objects we work with provide ways to get 3-d bounding boxes (for a unit sphere).
The interfaces for the three databases are quite different, however, so the big challenge is probably encapsulating that functionality behind a consistent interface for the rest of our Registry code to use. We might be able to leverage SQLAlchemy hooks for some of this customization, but I doubt they will cover everything. While Oracle and PostgreSQL both treat geometries as column types and spatial indexes much like normal indexes, they have different syntax for inserting, querying, and relating them. SQLite is even more different: its R tree implementation involves creating a special table for every spatial column, in which lower and upper bounds behave like regular (but separate) columms. Adding at least some of the abstraction to our Database class might be a lot easier, and no less usable, though at present it doesn't do much with SELECT queries (they're just passed through to SQLAlchemy directly). And I don't think this functionality needs to be totally transparent - just database-generic. It would be totally reasonable for Registry code that's using our Database interface to work with a spatial column to have to do things it wouldn't have to do with a normal column, like manually extracting bounding boxes from regions for inserts, calling Database methods to get SQLAlchemy expressions that represent overlaps in query join/where clauses, or doing post-query filtering in Python (as we do now).
This ticket is for adding that abstraction layer to some combination of SQLAlchemy and the Database class. I'll create another one for actually using it in Registry (maybe after the design for this one is further along).