Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-35850

Collect ideas and requirements for daf_butler obscore implementation

    XMLWordPrintable

    Details

    • Story Points:
      10
    • Sprint:
      DB_F22_6, DB_S23_6
    • Team:
      Data Access and Database
    • Urgent?:
      No

      Description

      Time to start implementation of ObsCore table as part daf_butler Registry. Before doing any actual work I want to spend some time on trying to understand all requirements and options that we need for that. TechNote already lists few ideas for implementation, I'm going to expand that in more details and also try to figure out the issues related to schema migration here. Also, dax_obscore already has some parts that could be reused.

        Attachments

          Issue Links

            Activity

            Hide
            gpdf Gregory Dubois-Felsmann added a comment -

            Another near-requirement, I think, and probably again one that's not going to be difficult to meet:

            From the perspective of just the content of the ObsCore table, we want the configurability to be able to make it possible for the access_url to be generated in a way compatible with at least the following situations, covering both the "direct" and "CADC" (indirect access via a DataLink "links service") models:

            1. file:// URL to the physical files when on a Posix filesystem (direct)
            2. templated https:// URL to the physical files when such a server is available in a particular installation (direct)
            3. templated https:// URL to a "links service" (CADC)

            In the "direct" model, the access_format would be determined based on the DatasetType of a dataset. In the CADC model (which is what we're using for the statically extracted service for DP0.2) it's a single standard-prescribed value for a DataLink "links response" table.

            I want to preserve the ability to use the "direct" model in smaller-scale situations.

            Again, from what I understand about what Andy did originally, and is likely to continue to do, I see no reason for this to be any sort of problem to meet.

            Show
            gpdf Gregory Dubois-Felsmann added a comment - Another near-requirement, I think, and probably again one that's not going to be difficult to meet: From the perspective of just the content of the ObsCore table, we want the configurability to be able to make it possible for the access_url to be generated in a way compatible with at least the following situations, covering both the "direct" and "CADC" (indirect access via a DataLink "links service") models: file:// URL to the physical files when on a Posix filesystem (direct) templated https:// URL to the physical files when such a server is available in a particular installation (direct) templated https:// URL to a "links service" (CADC) In the "direct" model, the access_format would be determined based on the DatasetType of a dataset. In the CADC model (which is what we're using for the statically extracted service for DP0.2) it's a single standard-prescribed value for a DataLink "links response" table. I want to preserve the ability to use the "direct" model in smaller-scale situations. Again, from what I understand about what Andy did originally, and is likely to continue to do, I see no reason for this to be any sort of problem to meet.
            Hide
            tjenness Tim Jenness added a comment -

            Couple of quick comments:

            • Using pgsphere means we have to support our own postgres deployment on Google since CloudSQL does not support it (we wouldn't even be able to request it given no-one can agree on which pgsphere we are meant to use). Maybe we don't need any of this to run on Google because a data release will be using Qserv and we only want this to run at USDF and the summit.
            • Storing the direct URIs means that we are never going to do composite disassembly. It now seems pretty clear that no-one wants to do composite disassembly so I should give up the idea. The links service at least gave the illusion that on the fly assembly was a thing that could happen (along with on the fly format conversion).
            • Direct URIs also means that the ObsCore manager has to be able to talk to datastore to get that URI (previously ObsCore was entirely registry).
            Show
            tjenness Tim Jenness added a comment - Couple of quick comments: Using pgsphere means we have to support our own postgres deployment on Google since CloudSQL does not support it (we wouldn't even be able to request it given no-one can agree on which pgsphere we are meant to use). Maybe we don't need any of this to run on Google because a data release will be using Qserv and we only want this to run at USDF and the summit. Storing the direct URIs means that we are never going to do composite disassembly. It now seems pretty clear that no-one wants to do composite disassembly so I should give up the idea. The links service at least gave the illusion that on the fly assembly was a thing that could happen (along with on the fly format conversion). Direct URIs also means that the ObsCore manager has to be able to talk to datastore to get that URI (previously ObsCore was entirely registry).
            Hide
            gpdf Gregory Dubois-Felsmann added a comment -

            Quick reactions:

            • Yes, the pgsphere issue does complicate cloud deployment. We've looked at that at IPAC as well in some contexts, w.r.t. the Amazon cloud (because of NASA's deal with Amazon). Amazon has pre-packaged configurations of Postgres including PostGIS, but not pgsphere. At some point the astronomy community might be well advised to find the resources, in some project, to build a full-featured ADQL solution against PostGIS, perhaps including adding additional capabilities to PostGIS itself. pgsphere, even if it is currently still being maintained, is looking like a fragile reed.
            • I have been assuming that in DRP releases we would need ObsCore tables both in Qserv (for use/joining with the catalogs, as we're doing on DP0.2) and in Postgres, as part of the CDB, for joining with observatory metadata. So I think we'll need TAP-with-geometry on Postgres even in DRP.
            • The PPDB is by default Postgres, I think, and we'll certainly need an ObsCore table for the AP image data products, ideally joinable with the PPDB catalog contents.
            • I am not suggesting that we would ever use the "direct" model on DRP outputs. I just don't want to foreclose it in the code itself. If composite disassembly becomes something we really need to do, I wouldn't object to it on the grounds that it prevents using the "direct" model.
            Show
            gpdf Gregory Dubois-Felsmann added a comment - Quick reactions: Yes, the pgsphere issue does complicate cloud deployment. We've looked at that at IPAC as well in some contexts, w.r.t. the Amazon cloud (because of NASA's deal with Amazon). Amazon has pre-packaged configurations of Postgres including PostGIS, but not pgsphere. At some point the astronomy community might be well advised to find the resources, in some project, to build a full-featured ADQL solution against PostGIS, perhaps including adding additional capabilities to PostGIS itself. pgsphere, even if it is currently still being maintained, is looking like a fragile reed. I have been assuming that in DRP releases we would need ObsCore tables both in Qserv (for use/joining with the catalogs, as we're doing on DP0.2) and in Postgres, as part of the CDB, for joining with observatory metadata. So I think we'll need TAP-with-geometry on Postgres even in DRP. The PPDB is by default Postgres, I think, and we'll certainly need an ObsCore table for the AP image data products, ideally joinable with the PPDB catalog contents. I am not suggesting that we would ever use the "direct" model on DRP outputs. I just don't want to foreclose it in the code itself. If composite disassembly becomes something we really need to do, I wouldn't object to it on the grounds that it prevents using the "direct" model.
            Hide
            salnikov Andy Salnikov added a comment - - edited

            ... lsst_visit, lsst_detector, lsst_tract, lsst_patch, lsst_band, and lsst_filter columns

            Gregory Dubois-Felsmann, indeed, I assume we want to keep these, and they (and probably something else) can be added via simple configuration. One issue here is that extending the configuration which causes obscore table schema change will need some coordination, so we should try to guess everything that we need in advance.

            re access_url variations

            This is again a matter of configuration, but I also think that some aspects may be easier to handle on a TAP server side, or maybe in a special database view, which should be easier to modify compared to migrating the data in the table. I'll keep that in mind working on implementation.

            re PPDB

            Joining PPDB and obscore is, I think, a whole new requirement. I believe it implies that we need to either have the Registry on the same server as PPDB, or we need to replicate obscore table to PPDB server. We probably need to know more specifics about this to decide what we can do.

            Show
            salnikov Andy Salnikov added a comment - - edited ... lsst_visit, lsst_detector, lsst_tract, lsst_patch, lsst_band, and lsst_filter columns Gregory Dubois-Felsmann , indeed, I assume we want to keep these, and they (and probably something else) can be added via simple configuration. One issue here is that extending the configuration which causes obscore table schema change will need some coordination, so we should try to guess everything that we need in advance. re access_url variations This is again a matter of configuration, but I also think that some aspects may be easier to handle on a TAP server side, or maybe in a special database view, which should be easier to modify compared to migrating the data in the table. I'll keep that in mind working on implementation. re PPDB Joining PPDB and obscore is, I think, a whole new requirement. I believe it implies that we need to either have the Registry on the same server as PPDB, or we need to replicate obscore table to PPDB server. We probably need to know more specifics about this to decide what we can do.
            Hide
            salnikov Andy Salnikov added a comment -

            I think there is nothing more to add on this ticket, and there is already implementation work done on other tickets. One possible issue with PPDB integration will need to be addressed on a separate ticket. I'm going to close this one.

            Show
            salnikov Andy Salnikov added a comment - I think there is nothing more to add on this ticket, and there is already implementation work done on other tickets. One possible issue with PPDB integration will need to be addressed on a separate ticket. I'm going to close this one.

              People

              Assignee:
              salnikov Andy Salnikov
              Reporter:
              salnikov Andy Salnikov
              Watchers:
              Andy Salnikov, Gregory Dubois-Felsmann, Jim Bosch, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.