Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-6964

Make a proposal for API support for representation of relationships between table columns

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: In Progress
    • Resolution: Unresolved
    • Fix Version/s: None
    • Component/s: Design Documents, SUIT
    • Labels:
    • Story Points:
      3
    • Team:
      Architecture

      Description

      End users and the SUIT need to be able to determine a variety of relationships between columns in the tabular data products produced by LSST. The particular example motivating this ticket is the need to answer the question "where in the table is the uncertainty data for column 'x'?".

      The answer could be:

      • "there isn't any"
      • "a symmetric Gaussian uncertainty is in column 'sigma_x'"
      • "asymmetric Gaussian uncertainties are in columns 'sigmaplus_x' and 'sigmaminus_x'"
      • "'x' is correlated with 'y' and the covariance matrix is in 'covar_xx', 'covar_xy', and 'covar_yy'"

      Ideally we would find a way for these relationships to be defined when the Apps code generates its afw.table outputs, discoverable through an API usable in the afw.table context, exportable to the database, and made available to end users and the SUIT. It should be usable whether the data are delivered to end users as reconstituted afw.table objects or as tables in common Python formats (at least Astropy tables).

      It should assist the SUIT in determining how to (automatically, though optionally) display uncertainty data when the primary data are requested.

      This ticket expresses the idea that a solution that consists purely of a documented convention about prefixes to the string names of columns is inadequate. We would like to avoid having to write code implementing that convention in, potentially, hundreds of places, and we would like to avoid requiring that end users know these conventions in order to see proper displays with error bars.

        Attachments

          Issue Links

            Activity

            Hide
            gpdf Gregory Dubois-Felsmann added a comment - - edited

            Transcript of conversation in Data Access HipChat room about this topic added as attached file.

            Show
            gpdf Gregory Dubois-Felsmann added a comment - - edited Transcript of conversation in Data Access HipChat room about this topic added as attached file.
            Hide
            gpdf Gregory Dubois-Felsmann added a comment - - edited

            Another possible answer might be:

            • "the uncertainty isn't in the table, but that column is known to have a constant Gaussian uncertainty of 0.754" - and in this case, we'd want to figure out where that number would come from. Ideally we wouldn't make end users copy and paste it out of some Data Quality report.
            Show
            gpdf Gregory Dubois-Felsmann added a comment - - edited Another possible answer might be: "the uncertainty isn't in the table, but that column is known to have a constant Gaussian uncertainty of 0.754" - and in this case, we'd want to figure out where that number would come from. Ideally we wouldn't make end users copy and paste it out of some Data Quality report.
            Hide
            gpdf Gregory Dubois-Felsmann added a comment -

            I think it's possible to make some progress on this ticket now at long last.

            I believe the Rubin/LSST conceptual baseline is for these relationships to be expressed as "Column Grouping" entities in the Felis definitions of the Science Data Model.  See https://felis.lsst.io/#column-groupings for a currently very brief description of the syntax.

            The precise syntax for expressing the relationship use cases in this ticket is not yet worked out, but this should now proceed.  I would suggest trying out some worked examples during 2020.

            To reference this back to the ticket description - the decision to do this at the SDM Felis level means that we have not found a "way for these relationships to be defined when the Apps code generates its afw.table outputs, discoverable through an API usable in the afw.table context", but rather that it is only at the "Standardization" phase that formal relationships of this nature are documented in the data model.

            Thus, at the pipeline-task-output afw.table level, relationships of this nature will be expressed in the documentation of the task generating the table, and likely by the use of typographical conventions for the names of columns related by "value-uncertainty" relationships.  The author of the standardization code will be expected to be familiar with the actual afw.table structure and these conventions, and transfer the appropriate values into the standardized table.

            The SDM Felis model will formally define the semantic relationships involved, and then we will need the TAP service to use information from the Felis model to guide the creation of appropriate <GROUP> elements in the VOTable headers it creates.

            In simple cases, the value-uncertainty relationships can be expressed in Felis simply by documenting composite UCDs such as "stat.error;phot.flux" for the error on a primary column with UCD "phot.flux".  More complex cases, like a covariance matrix for a pair of quantities, will probably require the definition of a JSON-LD-style vocabulary term for the relationship.

            The low-level engineering in Firefly to recognize <GROUP> entities in VOTables and allow the application to reason about them is in progress and should allow an example to be worked through this year.


            The next step on this ticket will be to more explicitly identify the individual pieces of work that would be involved in a proof-of-concept exercise.

            Show
            gpdf Gregory Dubois-Felsmann added a comment - I think it's possible to make some progress on this ticket now at long last. I believe the Rubin/LSST conceptual baseline is for these relationships to be expressed as "Column Grouping" entities in the Felis definitions of the Science Data Model.  See https://felis.lsst.io/#column-groupings  for a currently very brief description of the syntax. The precise syntax for expressing the relationship use cases in this ticket is not yet worked out, but this should now proceed.  I would suggest trying out some worked examples during 2020. To reference this back to the ticket description - the decision to do this at the SDM Felis level means that we have not found a "way for these relationships to be defined when the Apps code generates its afw.table outputs, discoverable through an API usable in the afw.table context", but rather that it is only at the "Standardization" phase that formal relationships of this nature are documented in the data model. Thus, at the pipeline-task-output afw.table level, relationships of this nature will be expressed in the documentation of the task generating the table, and likely by the use of typographical conventions for the names of columns related by "value-uncertainty" relationships.  The author of the standardization code will be expected to be familiar with the actual afw.table structure and these conventions, and transfer the appropriate values into the standardized table. The SDM Felis model will formally define the semantic relationships involved, and then we will need the TAP service to use information from the Felis model to guide the creation of appropriate <GROUP> elements in the VOTable headers it creates. In simple cases, the value-uncertainty relationships can be expressed in Felis simply by documenting composite UCDs such as "stat.error;phot.flux" for the error on a primary column with UCD "phot.flux".  More complex cases, like a covariance matrix for a pair of quantities, will probably require the definition of a JSON-LD-style vocabulary term for the relationship. The low-level engineering in Firefly to recognize <GROUP> entities in VOTables and allow the application to reason about them is in progress and should allow an example to be worked through this year. The next step on this ticket will be to more explicitly identify the individual pieces of work that would be involved in a proof-of-concept exercise.

              People

              Assignee:
              gpdf Gregory Dubois-Felsmann
              Reporter:
              gpdf Gregory Dubois-Felsmann
              Watchers:
              Brian Van Klaveren, Colin Slater, Fritz Mueller, Gregory Dubois-Felsmann, Jim Bosch, John Swinbank, Kian-Tat Lim, Yusra AlSayyad
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Dates

                Due:
                Created:
                Updated:

                  Jenkins

                  No builds found.