Fix Version/s: None
This follows the pattern used for other Registry systems. After this change, we could consider giving Datastore instances these classes and a manager for the opaque tables instead of a full Registry, would be better for interface segregation.
The default implementation of these manager/storage classes should not modify the current schema, but it will give us flexibility to change it via configuration in the future, so it probably still merits being part of the stability milestone.
The dataset_location table was meant to be a registry level table that is public and allows registry to answer general questions about whether a dataset is stored anywhere and how many copies of a dataset exist if we use chained datastores. The posix_datastore_records table is designed to be entirely internal and registry is an implementation detail. I'm fine with datastore using a more constrained class for writing to the registry – I assume it will effectively have zero impact on datastore other than being more explicit about the interface.
A fake connection object for InMemoryDatastore is fine with me – the interface to that table is already factored out in datastore in genericDatastore.py. No-one is really using the dataset_location tables for their intended use other than datastore anyhow. We already use dict in place of registry for internal records so wouldn't need a fancy connection object for dealing with what we currently call "opaqueData".
The code that moves datasets from dataset_location to dataset_location_trash is shared by all datastores so it's not a big job to switch that to something else. What do we really gain by having a table per datastore rather than using the name in the datastore as a column. On the one hand it becomes an internal implementation detail (like posix_datastore_records), but on the other hand you have to worry a bit more about transactions and you can't query one place for where all the trashed but somehow not trashed yet datasets are listed.
I was planning for Registry to keep the DatasetLocation methods that return information to the user, and just remove the ones that modify the information.
The opaque table interface would also remain on Registry because it might be of more general use, though I would also put those on the bridge object so Datastores don't need to carry around two things.
For the trash tables, I was thinking that it might be better if they were a Datastore implementation detail (but like the opaque tables, something they might delegate back to Registry, via the bridge object). InMemoryDatastore and ChainedDatastore don't need one, and an in-the-same-database SQL-based Datastore probably wouldn't need one either. And whether to have one or use the set difference with an internal table could also be an implementation (or configuration) choice for Posix and S3. But my reasoning for separate trash tables was just that sort of separation-of-concerns musing, not something I think would be more efficient or safe.
Tim Jenness, I gather you're slammed with reviews (and probably more), but I think you're the only viable reviewer for this one. No rush - I've got orthogonal tickets I can work on - and unless you think it'll be a long wait I'm happy to rebase on whatever typing changes you've made before merge (or even your review).
This looks really good to me. I like how you use the ephemeral version for testing. I'm fine with you merging it and me trying to rebase.
Tim Jenness, I'm contemplating making some changes to the Datastore/Registry boundary (code, not conceptual) that I wanted to run by you first: