Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-35532

Investigate feasibility of ObsCore as a view on Butler registry

    XMLWordPrintable

Details

    • 4
    • Data Access and Database
    • No

    Description

      When data are taken the files are transferred to the USDF and ingested into a butler registry. A lot of the general tooling developed by SQuaRE rely on ObsTAP queries of an ObsCore database table. Currently we generate the content of the ObsCore table by doing an explicit export of a butler registry using dax_obscore. This works but is an additional step that would have to work as part of the automated data ingest explicitly exporting the records for the new raws and adding them to the ObsCore table. Presumably some data processing will also happen at USDF and we would like, for example, calexp images to be available to other systems as soon as the processing completes. Do we want to have to do an ObsCore export/import every time a batch job completes?

      Things would be simpler if we coulde have an ObsCore view of the butler registry that automatically updates when new files are added or deleted. This ticket is to investigate the feasibility of this and to write a tech note with options.

      Attachments

        Issue Links

          Activity

            Just FYI, please see DM-35740 for a separate issue regarding the usability of the Registry-to-ObsCore conversion code, and a possible refactoring. It might be worth including in whatever architecture is developed to implement the client-side trigger model.

            gpdf Gregory Dubois-Felsmann added a comment - Just FYI, please see DM-35740 for a separate issue regarding the usability of the Registry-to-ObsCore conversion code, and a possible refactoring. It might be worth including in whatever architecture is developed to implement the client-side trigger model.

            Final comment: right now dax_obscore has very limited dependencies - - which makes it usable in non-Rubin applications (which we will do in SPHEREx). I have no specific reason to worry that this might change, but please do try to keep these dependencies slim if you develop the client-side trigger model in practice. We would certainly use it immediately in SPHEREx.

            gpdf Gregory Dubois-Felsmann added a comment - Final comment: right now dax_obscore has very limited dependencies - - which makes it usable in non-Rubin applications (which we will do in SPHEREx). I have no specific reason to worry that this might change, but please do try to keep these dependencies slim if you develop the client-side trigger model in practice. We would certainly use it immediately in SPHEREx.

            gpdf, I agree that some of region formatting job can be shifted to TAP service, still, to be efficient for spatial queries, we need it in a database-native representation suitable for indexing. Trouble is, of course, that database-native is also database-specific, and it would be nice to avoid backend-specific features in TAP. I think this is not very hard problem to solve, just need some attention.

            For the find-first issue and client-side "trigger", the insertion test should be easy - one just have to check that a run collection for the new dataset ref is a member of the configured chained collection. There are more interesting cases for removal of datasets, either as an explicit dataset purge, or removal of a run collection from a chained collection. I'll try to expand my technote with ideas for those cases.

            salnikov Andy Salnikov added a comment - gpdf , I agree that some of region formatting job can be shifted to TAP service, still, to be efficient for spatial queries, we need it in a database-native representation suitable for indexing. Trouble is, of course, that database-native is also database-specific, and it would be nice to avoid backend-specific features in TAP. I think this is not very hard problem to solve, just need some attention. For the find-first issue and client-side "trigger", the insertion test should be easy - one just have to check that a run collection for the new dataset ref is a member of the configured chained collection. There are more interesting cases for removal of datasets, either as an explicit dataset purge, or removal of a run collection from a chained collection. I'll try to expand my technote with ideas for those cases.

            I have added one more section to the technote with a bunch of random ideas for client-side implementation.

            tjenness, do you want to re-review it (or you can just say it's OK to merge like that).

            What I'm not sure about is how decoupled we can make this. Should butler have some kind of system that lets you add callbacks that are passed a DatasetRef?

            I think my current idea is to make it a new optional manager that will be called by other managers. We then could re-use all our tooling for optional loading of that manager and schema/data migration.

            salnikov Andy Salnikov added a comment - I have added one more section to the technote with a bunch of random ideas for client-side implementation. tjenness , do you want to re-review it (or you can just say it's OK to merge like that). What I'm not sure about is how decoupled we can make this. Should butler have some kind of system that lets you add callbacks that are passed a DatasetRef? I think my current idea is to make it a new optional manager that will be called by other managers. We then could re-use all our tooling for optional loading of that manager and schema/data migration.

            Thanks for review, merged, the result is here: https://dmtn-236.lsst.io/

            salnikov Andy Salnikov added a comment - Thanks for review, merged, the result is here: https://dmtn-236.lsst.io/

            People

              salnikov Andy Salnikov
              tjenness Tim Jenness
              Tim Jenness
              Andy Salnikov, Frossie Economou, Gregory Dubois-Felsmann, Jim Bosch, Kian-Tat Lim, Tim Jenness
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Jenkins

                  No builds found.