Uploaded image for project: 'Request For Comments'
  1. Request For Comments
  2. RFC-484

The Gen3 Butler Registry Schema

    XMLWordPrintable

    Details

    • Type: RFC
    • Status: Implemented
    • Resolution: Done
    • Component/s: DM
    • Labels:
      None

      Description

      An initial proposal for the Generation 3 Butler Registry (database) schema is now available for DM-wide review at https://dmtn-073.lsst.io/.

      This schema is both an important part of the Gen3 Butler implementation and a major public interface: in addition to pure-Python APIs, the Gen3 Butler will support direct SQL SELECT queries via a Python client. Those two limitations - SELECT-only and indirection through a Python client - permit any of the "tables" in the schema to be implemented as temporary tables or views over some more extensive, private database schema.

      This is a long RFC - the current planned end date is the end of LSST2018 - as I'd like every stakeholder in DM to have a chance to look it over and comment. I would also very much appreciate it if DM management could budget time for detailed reviews by certain architectural and domain experts, and to include time that may be necessary for them to help improve the schema - I am very much not an expert in many of the domains these schema touches, and would very much appreciate help in making it meet our requirements.

      I expect the proposed schema to evolve at least slightly over the course of the review. Please add yourself as a watcher of this ticket to be notified of any changes.

      Some important questions I'd like have explicitly considered include:

      • Can we imaging an underlying schema for all production databases (including the DBB and per-user databases in the science platform) that would meet our performance and long-term-support requirements?
      • Are there any concepts with need extra levels of indirection to support versioning and evolution over the course of the survey (note that versioning can also be handled in a lower-level private schema when simultaneous access to multiple versions via a single Butler client is not needed).
      • Is the data model defined by the concrete "DataUnit" concepts defined here rich enough to describe all LSST data products? (I am especially concerned about calibration products.)
      • Are the descriptions of Visit and Exposure metadata sufficient for identifying and selecting subsets of data via the Butler, and is the relationship between those two concepts appropriate?
      • How can we maximize consistency in observational metadata terminology and conventions across this schema, public LSST database schemas, LSST class objects (especially afw.image.VisitInfo), the LSST EFD, and established VO concepts?

      Finally, there is a meta-question: what should be action of accepting this document entail? I imagine it involves transition to some kind of LDM document; one candidate is that it could be a section in a middleware design document, while another is that it could be a standalone document.

        Attachments

          Issue Links

            Activity

            Hide
            afausti Angelo Fausti added a comment - - edited

            Jim Bosch sounds good, and if the "metric values" tables live in a different database we still could use a mechanism like database link in Oracle to do an SQL join with the Registry tables.

            Show
            afausti Angelo Fausti added a comment - - edited Jim Bosch sounds good, and if the "metric values" tables live in a different database we still could use a mechanism like database link in Oracle to do an SQL join with the Registry tables.
            Hide
            ebellm Eric Bellm added a comment -

            Jim Bosch I talked about this extensively with Angelo Fausti in the QA working group--see e.g. https://confluence.lsstcorp.org/display/DM/Tidy+Data

            Show
            ebellm Eric Bellm added a comment - Jim Bosch I talked about this extensively with Angelo Fausti in the QA working group--see e.g. https://confluence.lsstcorp.org/display/DM/Tidy+Data
            Hide
            jbosch Jim Bosch added a comment -

            Jim Bosch I talked about this extensively with Angelo Fausti in the QA working group--see e.g. https://confluence.lsstcorp.org/display/DM/Tidy+Data

            Perfect - sounds like we're all already on the same page.

            Show
            jbosch Jim Bosch added a comment - Jim Bosch  I talked about this extensively with  Angelo Fausti  in the QA working group--see e.g.  https://confluence.lsstcorp.org/display/DM/Tidy+Data Perfect - sounds like we're all already on the same page.
            Hide
            jbosch Jim Bosch added a comment -

            Adopting this after a discussion between Fritz Mueller, Michelle Gower, Michelle Butler [X], and myself, with the following conclusions:

            •  DM-15210 will make a number of minor changes to the schema, and sync the tech note with changes already made in daf_butler. These include adding (ra, dec) to Tract and moving size and checksum into Datastore-private tables.
            • DM-15536 will add a level of indirection between Exposure and Visit to make it possible to explore different associations of snaps into visits within the same data repository.
            • DM-15537 will rename Sensor to Detector to match current use in cameraGeom.

            After those changes, we will consider this RFC implemented, but are three areas in which the schema will remain very much provisional until they have been further investigated, resulting in new RFCs that propose changes to the provisional schema to address these issues.  Those issues are:

            • How to define unique integer dataset_id values across different DBB endpoints and user spaces.  Michelle Gower will lead this investigation, which will include at least those who have expressed opinions on that question here.
            • How to anticipate and support schema evolution.   Responsible party TBD.
            • What the fields for observational metadata should be, which units, time systems, and reference frames to use, and how this should be split up between Exposure and Visit (and possibly new tables that also depend on Detector/Sensor).  This includes how to maximize compatibility with CAOM2.  It is probably a prerequisite for the work described here.  Responsible party TBD.

             

            Show
            jbosch Jim Bosch added a comment - Adopting this after a discussion between Fritz Mueller , Michelle Gower , Michelle Butler [X] , and myself, with the following conclusions:   DM-15210 will make a number of minor changes to the schema, and sync the tech note with changes already made in daf_butler. These include adding (ra, dec) to Tract and moving size and checksum into Datastore-private tables. DM-15536 will add a level of indirection between Exposure and Visit to make it possible to explore different associations of snaps into visits within the same data repository. DM-15537 will rename Sensor to Detector to match current use in cameraGeom. After those changes, we will consider this RFC implemented, but are three areas in which the schema will remain very much provisional until they have been further investigated, resulting in new RFCs that propose changes to the provisional schema to address these issues.  Those issues are: How to define unique integer dataset_id values across different DBB endpoints and user spaces.  Michelle Gower will lead this investigation, which will include at least those who have expressed opinions on that question here. How to anticipate and support schema evolution.   Responsible party TBD. What the fields for observational metadata should be, which units, time systems, and reference frames to use, and how this should be split up between Exposure and Visit (and possibly new tables that also depend on Detector/Sensor).  This includes how to maximize compatibility with CAOM2.  It is probably a prerequisite for the work described here .  Responsible party TBD.  
            Hide
            jbosch Jim Bosch added a comment -

            Much of this RFC is no longer relevant, as we are no longer considering the SQL schema to be a public interface of the Gen3 butler - it's simply too hard to do (cost-free) indirection in SQL, as well as hard to make things consistent across database engines.

            A major part that is still relevant is the observational metadata, and I've created a new triggering ticket for this RFC to capture that (DM-24575). Contrary to the last post, I don't think that merits a new RFC anymore. Once that ticket is done, I think it'll be time to mark this RFC implemented.

            Show
            jbosch Jim Bosch added a comment - Much of this RFC is no longer relevant, as we are no longer considering the SQL schema to be a public interface of the Gen3 butler - it's simply too hard to do (cost-free) indirection in SQL, as well as hard to make things consistent across database engines. A major part that is still relevant is the observational metadata, and I've created a new triggering ticket for this RFC to capture that ( DM-24575 ). Contrary to the last post, I don't think that merits a new RFC anymore. Once that ticket is done, I think it'll be time to mark this RFC implemented.

              People

              Assignee:
              jbosch Jim Bosch
              Reporter:
              jbosch Jim Bosch
              Watchers:
              Andy Salnikov, Angelo Fausti, Brian Van Klaveren, Christopher Stephens [X] (Inactive), Christopher Waters, Colin Slater, David Shupe, Eric Bellm, Fritz Mueller, Gregory Dubois-Felsmann, Jim Bosch, John Swinbank, Jonathan Sick, Kian-Tat Lim, Leanne Guy, Merlin Fisher-Levine, Michelle Gower, Russell Owen, Tim Jenness, Wil O'Mullane
              Votes:
              0 Vote for this issue
              Watchers:
              20 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:
                Planned End:

                  Jenkins

                  No builds found.