Details
-
Type:
Story
-
Status: Done
-
Resolution: Done
-
Fix Version/s: None
-
Component/s: Design Documents
-
Labels:None
-
Team:DM Science
Description
Update LDM-153 to be auto-generated from the contents of the felis-defined YAML files, which will be stored in the cat package.
Attachments
Attachments
Activity
Fritz Mueller and Kian-Tat Lim, giving this to both of you for a sign-off before putting it up for RFC. Due to the number of changes that have been made to cat since the last version of the document, it is not column-for-column identical and thus does not pass KT's proposal for mechanical verification (see comment above and attached diff). I think a reasonable course is to just accepts those changes, but suggestions welcome.
Brian Van Klaveren, let me know if you have any other modifications you'd like to get in to the yaml before it goes into cat, so we can take care of it soon.
Looks OK to me, although it's not clear that procHistoryId is still useful since it has been removed from all the catalog tables (and thus there's no direct way of tying provenance to data).
To my mind this is not ready for approval, but this is mainly a concern about making sure that the narrative parts of the document reflect the way it is now generated.
- The document still alleges that the baseline schema originates in Enterprise Architect.
- Assuming that that is no longer true, do we have a plan for how to maintain the diagrams in the document? Can we use Graphviz on output extracted from foreign-key relationships in the Felis definitions?
- Are the diagrams in fact still valid? (I have not tried to examine this.)
- The document should (briefly) explain the production chain from Felis-in-Github through to the PDF, and should make clear to the reader which parts of the document are edited by hand and which are autogenerated. It may be useful in that regard to move the actual schema tables to an appendix, or to add a "Part 1" (English) and "Part 2" (autogenerated) structure above the current sections.
- The document must, for traceability, contain a SHA representing the state of the Felis code used as input.
- The launchpad.net link to SciSQL in Section 2 is probably out of date.
- I am not convinced that the discussion of UDFs is particularly germane here, though, given that either:
- If we are trying to inform the reader about the "native" query language we should be mentioning the qserv_* functions, too.
- If we are trying describe the actual user environment, we should at this point be discussing ADQL.
- It seems out of scope for this change request, but I am not convinced that the visit and exposure tables as defined here are consistent with the whole Butler Gen3 approach. That should perhaps be the subject of a separate assessment.
- Again out of scope, but is the provenance data model still a relevant baseline?
Thanks Gregory Dubois-Felsmann, I agree with basically all of those issues. There's a choice here between having a single RFC with both backend docgen changes + frontend text updates, or the path that I was originally aiming for with one RFC for the backend Felis-ization (with minimal content change), then afterwards do another RFC with substantial text updates.
I'd be happy to discuss these options, but let's do it next week after the lsp review.
I'd be OK with splitting it, too, but we'd at least need a "black box warning" at the top of the document about the need for other updates, I think. More next week...
I pushed some new changes that address the most urgent of Gregory Dubois-Felsmann's comments. I've replaced the first section which described the old schema files with some text on the new Felis-based tools, removed the Enterprise Architect references, updated the stale links for SciSQL, and added a line at the start of the tables which shows the cat SHA1 that was used. I put some text in the README also as a quick guide for how to update the document with a new schema.
With that, the TODO items I propose to defer (and ticket) are:
- Deciding what to do with the diagrams
- Description of ADQL, updates to the UDF discussion, possible mention of qserv_*
- Exposure and Visit tables re: Gen3.
- Checking if the provenance model is still relevant
Does that seem like a reasonably passable state for an RFC, Gregory Dubois-Felsmann?
This looks good to me, and I think it's a big step in the right direction. Gregory Dubois-Felsmann, please comment on Colin's note above – as this stands after Colin's most recent changes, is this in your opinion now ready for RFC?
Thanks for the reviews. Merged to master; there's still time for comments during the RFC process.
Attached a diff of the column list in the baseline LDM-153 vs. the column list currently on master of the cat package (sha ddbdf7d): baseline_vs_master_diff.txt
. There are a substantial number of changes which have been made since the document was baselined. I'm willing to accept these as-is and RFC them in bulk; I don't know that there's much value in attempting to re-assess each ticket in the past 5 years.
As a sanity check, the column list I obtain from scraping of the master version of cat SQL is identical to the yaml file. That is only checking the table names; not the various annotations.
(edit: on the attached diff, the left column is baseline and the right column is the latest master)