Here's a summary of what I'm thinking of doing.
1. Enhance sphgeom package:
- Implement convex polygon intersection testing
- Add binary and/or textual IO for the geometric primitives - Remove ellipse support
- Add a cartesian 3d bounding box type and methods to compute it for the various geometric primitives
- Transform into a standard stack package. Ditch the custom build system, use sconsUtils, standard stack package layout, wrap C++ using SWIG.
2. Add an ingestion task which, given an input dataset that contains exposures, either adds a spatial index for those exposures to the sqlite3 repository registry, or which produces an output repository with a sqlite3 index as an output dataset. I'm not sure which is preferable, and I'll likely need some guidance implementing either one. A somewhat more detailed explanation of how this would work:
- one table would contain the exposure ID, complete data ID columns, and some representation of the boundary polygon for each exposure.
- another would contain the 3d bounding box and exposure ID, and would be an sqlite3 R* index.
- we either register functions for testing whether polygons overlap via the python sqlite3 API and post filter R* search results in SQL, or post-filter in Python.
The last point motivates the Python wrapper. I could theoretically save myself a bunch of work by making the Python interface a single function that takes an sqlite3.Connection object and registers a bunch of UDFs. But that requires extracting the sqlite3 * C pointer from a PyObject sub-class defined in a C header from pysqlite that I almost certainly do not have access to. I don't see how to do this in a reasonable way.
3. Add one or more selection tasks (like selectSdssImages.py) which, given a spatial region specification (an explicit sky polygon, or more indirectly, a coadd patch id), returns a list of data IDs of overlapping exposures. This task could have multiple backends - e.g. one that runs queries against MySQL exposure tables that have been ingested with ingestProcessed.py from datarel, and another that queries a repo registry/sqlite3 output dataset.
Does this sound reasonable to you? Any thoughts are very welcome,