Fix Version/s: None
Component/s: daf_butler, dax_obscore
Team:Data Access and Database
Time to start implementation of ObsCore table as part daf_butler Registry. Before doing any actual work I want to spend some time on trying to understand all requirements and options that we need for that. TechNote already lists few ideas for implementation, I'm going to expand that in more details and also try to figure out the issues related to schema migration here. Also, dax_obscore already has some parts that could be reused.
- is triggered by
DM-35532 Investigate feasibility of ObsCore as a view on Butler registry
- is triggering
DM-35947 Implement live obscore table updates in daf_butler
Couple of quick comments:
- Using pgsphere means we have to support our own postgres deployment on Google since CloudSQL does not support it (we wouldn't even be able to request it given no-one can agree on which pgsphere we are meant to use). Maybe we don't need any of this to run on Google because a data release will be using Qserv and we only want this to run at USDF and the summit.
- Storing the direct URIs means that we are never going to do composite disassembly. It now seems pretty clear that no-one wants to do composite disassembly so I should give up the idea. The links service at least gave the illusion that on the fly assembly was a thing that could happen (along with on the fly format conversion).
- Direct URIs also means that the ObsCore manager has to be able to talk to datastore to get that URI (previously ObsCore was entirely registry).
- Yes, the pgsphere issue does complicate cloud deployment. We've looked at that at IPAC as well in some contexts, w.r.t. the Amazon cloud (because of NASA's deal with Amazon). Amazon has pre-packaged configurations of Postgres including PostGIS, but not pgsphere. At some point the astronomy community might be well advised to find the resources, in some project, to build a full-featured ADQL solution against PostGIS, perhaps including adding additional capabilities to PostGIS itself. pgsphere, even if it is currently still being maintained, is looking like a fragile reed.
- I have been assuming that in DRP releases we would need ObsCore tables both in Qserv (for use/joining with the catalogs, as we're doing on DP0.2) and in Postgres, as part of the CDB, for joining with observatory metadata. So I think we'll need TAP-with-geometry on Postgres even in DRP.
- The PPDB is by default Postgres, I think, and we'll certainly need an ObsCore table for the AP image data products, ideally joinable with the PPDB catalog contents.
- I am not suggesting that we would ever use the "direct" model on DRP outputs. I just don't want to foreclose it in the code itself. If composite disassembly becomes something we really need to do, I wouldn't object to it on the grounds that it prevents using the "direct" model.
... lsst_visit, lsst_detector, lsst_tract, lsst_patch, lsst_band, and lsst_filter columns
Gregory Dubois-Felsmann, indeed, I assume we want to keep these, and they (and probably something else) can be added via simple configuration. One issue here is that extending the configuration which causes obscore table schema change will need some coordination, so we should try to guess everything that we need in advance.
re access_url variations
This is again a matter of configuration, but I also think that some aspects may be easier to handle on a TAP server side, or maybe in a special database view, which should be easier to modify compared to migrating the data in the table. I'll keep that in mind working on implementation.
Joining PPDB and obscore is, I think, a whole new requirement. I believe it implies that we need to either have the Registry on the same server as PPDB, or we need to replicate obscore table to PPDB server. We probably need to know more specifics about this to decide what we can do.
I think there is nothing more to add on this ticket, and there is already implementation work done on other tickets. One possible issue with PPDB integration will need to be addressed on a separate ticket. I'm going to close this one.
Another near-requirement, I think, and probably again one that's not going to be difficult to meet:
From the perspective of just the content of the ObsCore table, we want the configurability to be able to make it possible for the access_url to be generated in a way compatible with at least the following situations, covering both the "direct" and "CADC" (indirect access via a DataLink "links service") models:
In the "direct" model, the access_format would be determined based on the DatasetType of a dataset. In the CADC model (which is what we're using for the statically extracted service for DP0.2) it's a single standard-prescribed value for a DataLink "links response" table.
I want to preserve the ability to use the "direct" model in smaller-scale situations.
Again, from what I understand about what Andy did originally, and is likely to continue to do, I see no reason for this to be any sort of problem to meet.