Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-18524

Update LDM-153 with contents of Yaml schema

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: Design Documents
    • Labels:
      None
    • Team:
      DM Science

      Description

      Update LDM-153 to be auto-generated from the contents of the felis-defined YAML files, which will be stored in the cat package.

        Attachments

          Activity

          No builds found.
          ctslater Colin Slater created issue -
          ctslater Colin Slater made changes -
          Field Original Value New Value
          Attachment baseline_vs_master_diff.txt [ 37301 ]
          Hide
          ctslater Colin Slater added a comment - - edited

          Attached a diff of the column list in the baseline LDM-153 vs. the column list currently on master of the cat package (sha ddbdf7d): baseline_vs_master_diff.txt. There are a substantial number of changes which have been made since the document was baselined. I'm willing to accept these as-is and RFC them in bulk; I don't know that there's much value in attempting to re-assess each ticket in the past 5 years.

          As a sanity check, the column list I obtain from scraping of the master version of cat SQL is identical to the yaml file. That is only checking the table names; not the various annotations.

          (edit: on the attached diff, the left column is baseline and the right column is the latest master)

          Show
          ctslater Colin Slater added a comment - - edited Attached a diff of the column list in the baseline LDM-153 vs. the column list currently on master of the cat package (sha ddbdf7d): baseline_vs_master_diff.txt . There are a substantial number of changes which have been made since the document was baselined. I'm willing to accept these as-is and RFC them in bulk; I don't know that there's much value in attempting to re-assess each ticket in the past 5 years. As a sanity check, the column list I obtain from scraping of the master version of cat SQL is identical to the yaml file. That is only checking the table names; not the various annotations. (edit: on the attached diff, the left column is baseline and the right column is the latest master)
          Hide
          ctslater Colin Slater added a comment -

          Fritz Mueller and Kian-Tat Lim, giving this to both of you for a sign-off before putting it up for RFC. Due to the number of changes that have been made to cat since the last version of the document, it is not column-for-column identical and thus does not pass KT's proposal for mechanical verification (see comment above and attached diff). I think a reasonable course is to just accepts those changes, but suggestions welcome.

          Brian Van Klaveren, let me know if you have any other modifications you'd like to get in to the yaml before it goes into cat, so we can take care of it soon.

          Show
          ctslater Colin Slater added a comment - Fritz Mueller and Kian-Tat Lim , giving this to both of you for a sign-off before putting it up for RFC. Due to the number of changes that have been made to cat since the last version of the document, it is not column-for-column identical and thus does not pass KT's proposal for mechanical verification (see comment above and attached diff). I think a reasonable course is to just accepts those changes, but suggestions welcome. Brian Van Klaveren , let me know if you have any other modifications you'd like to get in to the yaml before it goes into cat, so we can take care of it soon.
          ctslater Colin Slater made changes -
          Reviewers Fritz Mueller, Kian-Tat Lim [ fritzm, ktl ]
          Status To Do [ 10001 ] In Review [ 10004 ]
          Hide
          ktl Kian-Tat Lim added a comment -

          Looks OK to me, although it's not clear that procHistoryId is still useful since it has been removed from all the catalog tables (and thus there's no direct way of tying provenance to data).

          Show
          ktl Kian-Tat Lim added a comment - Looks OK to me, although it's not clear that procHistoryId is still useful since it has been removed from all the catalog tables (and thus there's no direct way of tying provenance to data).
          ktl Kian-Tat Lim made changes -
          Reviewers Fritz Mueller, Kian-Tat Lim [ fritzm, ktl ] Fritz Mueller [ fritzm ]
          Hide
          gpdf Gregory Dubois-Felsmann added a comment -

          To my mind this is not ready for approval, but this is mainly a concern about making sure that the narrative parts of the document reflect the way it is now generated.

          1. The document still alleges that the baseline schema originates in Enterprise Architect.
          2. Assuming that that is no longer true, do we have a plan for how to maintain the diagrams in the document? Can we use Graphviz on output extracted from foreign-key relationships in the Felis definitions?
          3. Are the diagrams in fact still valid? (I have not tried to examine this.)
          4. The document should (briefly) explain the production chain from Felis-in-Github through to the PDF, and should make clear to the reader which parts of the document are edited by hand and which are autogenerated. It may be useful in that regard to move the actual schema tables to an appendix, or to add a "Part 1" (English) and "Part 2" (autogenerated) structure above the current sections.
          5. The document must, for traceability, contain a SHA representing the state of the Felis code used as input.
          6. The launchpad.net link to SciSQL in Section 2 is probably out of date.
          7. I am not convinced that the discussion of UDFs is particularly germane here, though, given that either:
            1. If we are trying to inform the reader about the "native" query language we should be mentioning the qserv_* functions, too.
            2. If we are trying describe the actual user environment, we should at this point be discussing ADQL.
          8. It seems out of scope for this change request, but I am not convinced that the visit and exposure tables as defined here are consistent with the whole Butler Gen3 approach. That should perhaps be the subject of a separate assessment.
          9. Again out of scope, but is the provenance data model still a relevant baseline?
          Show
          gpdf Gregory Dubois-Felsmann added a comment - To my mind this is not ready for approval, but this is mainly a concern about making sure that the narrative parts of the document reflect the way it is now generated. The document still alleges that the baseline schema originates in Enterprise Architect. Assuming that that is no longer true, do we have a plan for how to maintain the diagrams in the document? Can we use Graphviz on output extracted from foreign-key relationships in the Felis definitions? Are the diagrams in fact still valid? (I have not tried to examine this.) The document should (briefly) explain the production chain from Felis-in-Github through to the PDF, and should make clear to the reader which parts of the document are edited by hand and which are autogenerated. It may be useful in that regard to move the actual schema tables to an appendix, or to add a "Part 1" (English) and "Part 2" (autogenerated) structure above the current sections. The document must, for traceability, contain a SHA representing the state of the Felis code used as input. The launchpad.net link to SciSQL in Section 2 is probably out of date. I am not convinced that the discussion of UDFs is particularly germane here, though, given that either: If we are trying to inform the reader about the "native" query language we should be mentioning the qserv_* functions, too. If we are trying describe the actual user environment, we should at this point be discussing ADQL. It seems out of scope for this change request, but I am not convinced that the visit and exposure tables as defined here are consistent with the whole Butler Gen3 approach. That should perhaps be the subject of a separate assessment. Again out of scope, but is the provenance data model still a relevant baseline?
          Hide
          ctslater Colin Slater added a comment -

          Thanks Gregory Dubois-Felsmann, I agree with basically all of those issues. There's a choice here between having a single RFC with both backend docgen changes + frontend text updates, or the path that I was originally aiming for with one RFC for the backend Felis-ization (with minimal content change), then afterwards do another RFC with substantial text updates.
          I'd be happy to discuss these options, but let's do it next week after the lsp review.

          Show
          ctslater Colin Slater added a comment - Thanks Gregory Dubois-Felsmann , I agree with basically all of those issues. There's a choice here between having a single RFC with both backend docgen changes + frontend text updates, or the path that I was originally aiming for with one RFC for the backend Felis-ization (with minimal content change), then afterwards do another RFC with substantial text updates. I'd be happy to discuss these options, but let's do it next week after the lsp review.
          Hide
          gpdf Gregory Dubois-Felsmann added a comment -

          I'd be OK with splitting it, too, but we'd at least need a "black box warning" at the top of the document about the need for other updates, I think. More next week...

          Show
          gpdf Gregory Dubois-Felsmann added a comment - I'd be OK with splitting it, too, but we'd at least need a "black box warning" at the top of the document about the need for other updates, I think. More next week...
          Hide
          ctslater Colin Slater added a comment -

          I pushed some new changes that address the most urgent of Gregory Dubois-Felsmann's comments. I've replaced the first section which described the old schema files with some text on the new Felis-based tools, removed the Enterprise Architect references, updated the stale links for SciSQL, and added a line at the start of the tables which shows the cat SHA1 that was used. I put some text in the README also as a quick guide for how to update the document with a new schema.

          With that, the TODO items I propose to defer (and ticket) are:

          • Deciding what to do with the diagrams
          • Description of ADQL, updates to the UDF discussion, possible mention of qserv_*
          • Exposure and Visit tables re: Gen3.
          • Checking if the provenance model is still relevant

          Does that seem like a reasonably passable state for an RFC, Gregory Dubois-Felsmann?

          Show
          ctslater Colin Slater added a comment - I pushed some new changes that address the most urgent of Gregory Dubois-Felsmann 's comments. I've replaced the first section which described the old schema files with some text on the new Felis-based tools, removed the Enterprise Architect references, updated the stale links for SciSQL, and added a line at the start of the tables which shows the cat SHA1 that was used. I put some text in the README also as a quick guide for how to update the document with a new schema. With that, the TODO items I propose to defer (and ticket) are: Deciding what to do with the diagrams Description of ADQL, updates to the UDF discussion, possible mention of qserv_* Exposure and Visit tables re: Gen3. Checking if the provenance model is still relevant Does that seem like a reasonably passable state for an RFC, Gregory Dubois-Felsmann ?
          Hide
          fritzm Fritz Mueller added a comment -

          This looks good to me, and I think it's a big step in the right direction. Gregory Dubois-Felsmann, please comment on Colin's note above – as this stands after Colin's most recent changes, is this in your opinion now ready for RFC?

          Show
          fritzm Fritz Mueller added a comment - This looks good to me, and I think it's a big step in the right direction. Gregory Dubois-Felsmann , please comment on Colin's note above – as this stands after Colin's most recent changes, is this in your opinion now ready for RFC?
          fritzm Fritz Mueller made changes -
          Status In Review [ 10004 ] Reviewed [ 10101 ]
          Hide
          ctslater Colin Slater added a comment -

          Thanks for the reviews. Merged to master; there's still time for comments during the RFC process.

          Show
          ctslater Colin Slater added a comment - Thanks for the reviews. Merged to master; there's still time for comments during the RFC process.
          ctslater Colin Slater made changes -
          Resolution Done [ 10000 ]
          Status Reviewed [ 10101 ] Done [ 10002 ]

            People

            Assignee:
            ctslater Colin Slater
            Reporter:
            ctslater Colin Slater
            Reviewers:
            Fritz Mueller
            Watchers:
            Colin Slater, Fritz Mueller, Gregory Dubois-Felsmann, Kian-Tat Lim
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                CI Builds

                No builds found.