An initial proposal for the Generation 3 Butler Registry (database) schema is now available for DM-wide review at https://dmtn-073.lsst.io/.
This schema is both an important part of the Gen3 Butler implementation and a major public interface: in addition to pure-Python APIs, the Gen3 Butler will support direct SQL SELECT queries via a Python client. Those two limitations - SELECT-only and indirection through a Python client - permit any of the "tables" in the schema to be implemented as temporary tables or views over some more extensive, private database schema.
This is a long RFC - the current planned end date is the end of LSST2018 - as I'd like every stakeholder in DM to have a chance to look it over and comment. I would also very much appreciate it if DM management could budget time for detailed reviews by certain architectural and domain experts, and to include time that may be necessary for them to help improve the schema - I am very much not an expert in many of the domains these schema touches, and would very much appreciate help in making it meet our requirements.
I expect the proposed schema to evolve at least slightly over the course of the review. Please add yourself as a watcher of this ticket to be notified of any changes.
Some important questions I'd like have explicitly considered include:
- Can we imaging an underlying schema for all production databases (including the DBB and per-user databases in the science platform) that would meet our performance and long-term-support requirements?
- Are there any concepts with need extra levels of indirection to support versioning and evolution over the course of the survey (note that versioning can also be handled in a lower-level private schema when simultaneous access to multiple versions via a single Butler client is not needed).
- Is the data model defined by the concrete "DataUnit" concepts defined here rich enough to describe all LSST data products? (I am especially concerned about calibration products.)
- Are the descriptions of Visit and Exposure metadata sufficient for identifying and selecting subsets of data via the Butler, and is the relationship between those two concepts appropriate?
- How can we maximize consistency in observational metadata terminology and conventions across this schema, public LSST database schemas, LSST class objects (especially afw.image.VisitInfo), the LSST EFD, and established VO concepts?
Finally, there is a meta-question: what should be action of accepting this document entail? I imagine it involves transition to some kind of LDM document; one candidate is that it could be a section in a middleware design document, while another is that it could be a standalone document.