Fix Version/s: None
Component/s: daf_butler, dax_obscore
Team:Data Access and Database
When data are taken the files are transferred to the USDF and ingested into a butler registry. A lot of the general tooling developed by SQuaRE rely on ObsTAP queries of an ObsCore database table. Currently we generate the content of the ObsCore table by doing an explicit export of a butler registry using dax_obscore. This works but is an additional step that would have to work as part of the automated data ingest explicitly exporting the records for the new raws and adding them to the ObsCore table. Presumably some data processing will also happen at USDF and we would like, for example, calexp images to be available to other systems as soon as the processing completes. Do we want to have to do an ObsCore export/import every time a batch job completes?
Things would be simpler if we coulde have an ObsCore view of the butler registry that automatically updates when new files are added or deleted. This ticket is to investigate the feasibility of this and to write a tech note with options.
- is triggering
DM-35850 Collect ideas and requirements for daf_butler obscore implementation
- relates to
DM-37275 Establish ObsTAP service for use with a Postgres Butler registry
- To Do
- links to
Final comment: right now dax_obscore has very limited dependencies - - which makes it usable in non-Rubin applications (which we will do in SPHEREx). I have no specific reason to worry that this might change, but please do try to keep these dependencies slim if you develop the client-side trigger model in practice. We would certainly use it immediately in SPHEREx.
Gregory Dubois-Felsmann, I agree that some of region formatting job can be shifted to TAP service, still, to be efficient for spatial queries, we need it in a database-native representation suitable for indexing. Trouble is, of course, that database-native is also database-specific, and it would be nice to avoid backend-specific features in TAP. I think this is not very hard problem to solve, just need some attention.
For the find-first issue and client-side "trigger", the insertion test should be easy - one just have to check that a run collection for the new dataset ref is a member of the configured chained collection. There are more interesting cases for removal of datasets, either as an explicit dataset purge, or removal of a run collection from a chained collection. I'll try to expand my technote with ideas for those cases.
I have added one more section to the technote with a bunch of random ideas for client-side implementation.
Tim Jenness, do you want to re-review it (or you can just say it's OK to merge like that).
What I'm not sure about is how decoupled we can make this. Should butler have some kind of system that lets you add callbacks that are passed a DatasetRef?
I think my current idea is to make it a new optional manager that will be called by other managers. We then could re-use all our tooling for optional loading of that manager and schema/data migration.
Thanks for review, merged, the result is here: https://dmtn-236.lsst.io/
Just FYI, please see DM-35740 for a separate issue regarding the usability of the Registry-to-ObsCore conversion code, and a possible refactoring. It might be worth including in whatever architecture is developed to implement the client-side trigger model.