Details
-
Type:
Story
-
Status: Won't Fix
-
Resolution: Done
-
Fix Version/s: None
-
Component/s: daf_butler
-
Story Points:4
-
Epic Link:
-
Sprint:DB_S20_02
-
Team:Data Access and Database
-
Urgent?:No
Description
Our current approach to spatial indexing on the sphere is simply to store the HTM IDs each region overlaps at a particular level, in an extra "common skypix overlap" table for every table that has a region column. This might be an okay approach for regions of a certain size, but it's definitely not a good approach for regions of arbitrary size, and probably not close to the best approach even for its ideal region size.
Doing better would probably require using special index types built for spatial overlap queries (R trees and variants thereof). While the databases of interest to us (SQLite, Oracle, and PostgreSQL) do not support indexing regions on the sphere natively, they all do provide ways to index 3-d bounding boxes, and the lsst.sphgeom objects we work with provide ways to get 3-d bounding boxes (for a unit sphere).
The interfaces for the three databases are quite different, however, so the big challenge is probably encapsulating that functionality behind a consistent interface for the rest of our Registry code to use. We might be able to leverage SQLAlchemy hooks for some of this customization, but I doubt they will cover everything. While Oracle and PostgreSQL both treat geometries as column types and spatial indexes much like normal indexes, they have different syntax for inserting, querying, and relating them. SQLite is even more different: its R tree implementation involves creating a special table for every spatial column, in which lower and upper bounds behave like regular (but separate) columms. Adding at least some of the abstraction to our Database class might be a lot easier, and no less usable, though at present it doesn't do much with SELECT queries (they're just passed through to SQLAlchemy directly). And I don't think this functionality needs to be totally transparent - just database-generic. It would be totally reasonable for Registry code that's using our Database interface to work with a spatial column to have to do things it wouldn't have to do with a normal column, like manually extracting bounding boxes from regions for inserts, calling Database methods to get SQLAlchemy expressions that represent overlaps in query join/where clauses, or doing post-query filtering in Python (as we do now).
This ticket is for adding that abstraction layer to some combination of SQLAlchemy and the Database class. I'll create another one for actually using it in Registry (maybe after the design for this one is further along).
Conversations on slack revealed that my initial investigation missed some built-in and extension database functionality we should at least investigate first before diving into this scheme.
The most obvious one is that Oracle Spatial does support expressing geometries on the sphere natively and Christopher Stephens [X] has a contact at NCSA who has used that successfully in astronomy. We should check that this includes indexing spatial regions for overlap queries and joins (not just point-in-region queries) before diving in, but if it does, it's probably the way to go there. There's a slight chance that the Oracle spherical stuff is actually slower than just using the Oracle 3-d stuff directly, because it can also do ellipsoids and heights (a lot of sophistication we don't need), but hopefully the code path that needs spherical but doesn't need those things is exercised enough by non-astronomers to be reasonably optimized.
I had already known of pgSphere, but hadn't considering it worth exploring until/unless profiling revealed we needed it, as using the scheme described earlier for all databases seemed easier. If we have a more native approach for Oracle, it may be worth using pgSphere for PostgreSQL, too, though its maintenance situation does not seem ideal. PostGIS also now seems to have support for spherical geometries, but this "GEOGRAPHY" type seems largely distinct from its much more broadly-used and complete 2-d map projection functionality.
SpatiaLite aims to bring PostGIS-like functionality to SQLite, and it's unclear whether it has anything useful for us. It certainly has a lot of 2-d map projection geometry functionality, but I have not been able to wade through all of the GIS jargon and acronyms to determine whether it can actually do indexing over a significant fraction of the sphere.
On that "2-d map projection functionality" point: this seems to be the approach that is big in the open-source GIS world, and I've always assumed it just wouldn't work for us, because we just can't put all of the regions we might want to relate onto the same 2-d map projection (and I've never seen it used in astronomy). But there's so much stuff to wade through that's not of interest to us that it's very hard to tell whether there is any stuff that is of use to us. Actually asking on some email lists may teach us a lot more than a similar amount of time spent poring over docs; I already found one such thread involving an astronomer that helped a lot, though it was about distance calculations rather than region overlaps and doesn't directly answer the most critical questions for us.