I'm having some misgivings about where this is going, in terms of dependencies and complexity:
- I'm still not thrilled about adding a new spatial query system in parallel to the one we already have, based on not just PostgreSQL-specific but pgSphere-specific types, all because we are tied to a particular TAP implementation. If we're talking about this going in a manager implementation in daf_butler, those are pretty unfortunate dependencies to be adding (even implicitly, in the case of the TAP implementation), and while they'd be optional dependencies for users, they'd have to be considered required for developers due to the need to test that we don't break the ObsCore functionality.
- Guaranteeing consistency between Registry and Datastore during dataset writes and especially deletes is already super tricky and something we do not do as well as we should. Having a new hook in play for those operations will make that important work harder to get right, especially if they involve Python logic. I suspect they would at least need to be able to handle rollbacks rigorously, and that could be very hard, especially if we can't rely fully on ON DELETE CASCADE.
- If regions need to come from (e.g.) calexp WCSs instead of the Registry, we're actually talking about a Registry manager that knows not just about Datastore contents but the Exposure storage class as well. That's not something we want to put in daf_butler, especially since any test that exercises that that region creation would need access to formatters defined over in obs_base (based on code in afw).
- Even if the manager implementation lived in obs_base (or some new package at around the same point in the package structure), with only an ABC for it in daf_butler, actually running that region-creation code is going to be pretty I/O heavy, and having it triggered deep inside butler when inserting datasets into particular collections means in practice that it's going to be run by processes like BPS transfer jobs, which we actually want to be quite focused on their main tasks, and not slowed down (or possibly disrupted) by add-on functionality.
All that makes me think that "live OBsCore" is just not going to work, at least not in terms of being able to provide regions (or, ideally, anything that involves inserts into a separate table rather than a true SQL view). I know Gregory Dubois-Felsmann has said in the past that the "live" ObsCore doesn't need to be fully-featured, and I wonder if "whatever we can define via an actual SQL view" includes enough to be viable as what we support live. We could support that via new hooks in daf_butler for creating views or introspecting the actual SQL schema generated by the configured managers, and then let code in obs_base or a new package actually define the view when run (and then, being a view, it'd naturally stay updated). We could similarly support a more fully-featured, non-live ObsCore by beefing up the opaque-table interfaces as necessary for other downstream code to augment the views with actual new tables. And if we need the non-live ObsCore view to behave like it's live in some specific context (e.g. for summit observers), then that sounds like it should really be a new service that maintains that view. As I understand it, we have a whole observatory event-handling system for that sort of thing. And we already have precedent here: inserting rows into an ObsCore table based on content in other butler tables feels a lot like inserting rows into the visit (and related) tables based on content in the exposure table, and we have never expected for butler to be the thing that triggers visit definition.
As a side, note, if we need to get the calexp (etc) WCS/bbox/region information into the database to allow all of this to work without a ton of new I/O, then I think we need to bite the bullet and tackle DM-21773 and add per-StorageClass metadata to Registry that's populated by the formatters.