Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-20011

Update glossary to include QA-related terms in the DMTN-085

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:
    • Story Points:
      1
    • Team:
      DM Science

      Description

      The QA working group recommended to adopt the definitions of QA-related terms in the DMTN-085 glossary subsystem-wide in QAWG-REC-1. At the DMLT F2F meeting, this recommendation was accepted.

      This glossary should be audited for correctness and for clashes with higher LSST glossaries.

        Attachments

          Issue Links

            Activity

            Hide
            mgraham Melissa Graham added a comment -

            I have taken the Glossary terms from DMTN-085 and put them into the spreadsheet format used for DM-14877 (see uploaded QA_Definitions_20190613.numbers and .csv file).

            Once the DMTN-085 authors (Eric Bellm or a co-author) review and correct the following, the attached file can be merged with the master Glossary file (probably by Tim Jenness) or MLG can provide a final merged spreadsheet file. No pre-existing Glossary terms needed to be modified based on these new terms, so merging should be straightforward.

             

            Not included because it did not have a definition:
            CI Continuous Integration
            HSC Hyper Suprime-Cam
            KPM Key Performance Metric
            LDF LSST Data Facility
            QAWG QA Strategy Working Group

            Incorporated with some modification:
            drill down "Move from a higher level aggregation of data to its inputs. For example, given data describing a tract, to drill down to constituent patches and then to objects. Also refers to the act of identifying an issue in a high-level summary of the data (e.g. an aberrant metric value) and interactively investigating its inputs to find the source of the problem."
            General Parallel File System "The bulk data storage provided through a POSIX filesystem interface at the LSST Data Facility. Refers specifically to IBM’s General Parallel File System; aslo known as IBM Spectrum Scale." GPFS
            metric value "The result of computing a particular metric on some given data. Note that metric values are typically computed rather than measured. See also: metric."
            monitoring "In DM QA, this refers to the process of collecting, storing, aggregating and visualizing metrics."
            Quality Assurance "All activities, deliverables, services, documents, procedures or artifacts which are designed to ensure the quality of DM deliverables. This may include QC systems, in so far as they are covered in the charge described in LDM-622. Note that contrasts with the LDM-522 definition of “QA” as “Quality Analysis”, a manual process which occurs only during commissioning and operations. See also: Quality Control." QA
            Quality Control "Services and processes which are aimed at measuring and monitoring a system to verify and characterize its performance (as in LDM-522). Quality Control systems run autonomously, only notifying people when an anomaly has been detected. See also Quality Assurance." QC
            releaseable product "A software package or other component of the DM system which is expected to be included in the next tagged release of the system. This implies inclusion in a standard top-level package. See also release-tag."

            Incorporated with little to no modification:
            aggregate metric "An aggregation of multiple point metrics. For example, the overall photometric repeatability for a particular tract given multiple observations of each star."
            aggregation "A single result—e.g., a metric value—computed from a collection of input values. For example, we can sum or average a metric computed over patches to produce an aggregate metric at tract level."
            Apache Parquet "A columnar storage data persistence format maintained by the Apache project; http://parquet.apache.org."
            dashboard "A visual display of the most important information needed to achieve one or more objectives, consolidated and arranged on a single screen so that the information can be monitored at a glance (Few, 2013)."
            metric "We follow the SQR-019 definition of a metric as a measurable quantities which may be tracked. A metric has a name, description, unit, references, and tags (which are used for grouping). A metric is a scalar by definition. We consider multiple types of metric in this document; see aggregate metric, model metric, point metric."
            model metric "A metric describing a model related to the data. For example, the coeficients of a 2D polynomial fit to the background of a single CCD exposure."
            point metric "A metric that is associated with a single entry in a catalog. Examples include the shape of a source, the standard deviation of the flux of an object detected on a coadd, the flux of an source detected on a difference image."
            tidy data "Tidy datasets have a specific structure: each variable is a column, each observation is a row, and each type of observational unit is a table (Wickham, H., 2014, Journal of Statistical Software, Articles, 59, 1)."

            Already included in glossary:
            DM Data Management.
            provenance A description of the inputs and processes which have been used to generate a particular result or data product.
            SDQA Science Data Quality Assurance.
            SQuaSH Science Quality Analysis Harness; SQR-009; https://squash.lsst.codes.

            Show
            mgraham Melissa Graham added a comment - I have taken the Glossary terms from DMTN-085 and put them into the spreadsheet format used for DM-14877 (see uploaded QA_Definitions_20190613.numbers and .csv file). Once the DMTN-085 authors ( Eric Bellm or a co-author) review and correct the following, the attached file can be merged with the master Glossary file (probably by Tim Jenness ) or MLG can provide a final merged spreadsheet file. No pre-existing Glossary terms needed to be modified based on these new terms, so merging should be straightforward.   Not included because it did not have a definition: CI Continuous Integration HSC Hyper Suprime-Cam KPM Key Performance Metric LDF LSST Data Facility QAWG QA Strategy Working Group Incorporated with some modification: drill down "Move from a higher level aggregation of data to its inputs. For example, given data describing a tract, to drill down to constituent patches and then to objects. Also refers to the act of identifying an issue in a high-level summary of the data (e.g. an aberrant metric value) and interactively investigating its inputs to find the source of the problem." General Parallel File System "The bulk data storage provided through a POSIX filesystem interface at the LSST Data Facility. Refers specifically to IBM’s General Parallel File System; aslo known as IBM Spectrum Scale." GPFS metric value "The result of computing a particular metric on some given data. Note that metric values are typically computed rather than measured. See also: metric." monitoring "In DM QA, this refers to the process of collecting, storing, aggregating and visualizing metrics." Quality Assurance "All activities, deliverables, services, documents, procedures or artifacts which are designed to ensure the quality of DM deliverables. This may include QC systems, in so far as they are covered in the charge described in LDM-622. Note that contrasts with the LDM-522 definition of “QA” as “Quality Analysis”, a manual process which occurs only during commissioning and operations. See also: Quality Control." QA Quality Control "Services and processes which are aimed at measuring and monitoring a system to verify and characterize its performance (as in LDM-522). Quality Control systems run autonomously, only notifying people when an anomaly has been detected. See also Quality Assurance." QC releaseable product "A software package or other component of the DM system which is expected to be included in the next tagged release of the system. This implies inclusion in a standard top-level package. See also release-tag." Incorporated with little to no modification: aggregate metric "An aggregation of multiple point metrics. For example, the overall photometric repeatability for a particular tract given multiple observations of each star." aggregation "A single result—e.g., a metric value—computed from a collection of input values. For example, we can sum or average a metric computed over patches to produce an aggregate metric at tract level." Apache Parquet "A columnar storage data persistence format maintained by the Apache project; http://parquet.apache.org ." dashboard "A visual display of the most important information needed to achieve one or more objectives, consolidated and arranged on a single screen so that the information can be monitored at a glance (Few, 2013)." metric "We follow the SQR-019 definition of a metric as a measurable quantities which may be tracked. A metric has a name, description, unit, references, and tags (which are used for grouping). A metric is a scalar by definition. We consider multiple types of metric in this document; see aggregate metric, model metric, point metric." model metric "A metric describing a model related to the data. For example, the coeficients of a 2D polynomial fit to the background of a single CCD exposure." point metric "A metric that is associated with a single entry in a catalog. Examples include the shape of a source, the standard deviation of the flux of an object detected on a coadd, the flux of an source detected on a difference image." tidy data "Tidy datasets have a specific structure: each variable is a column, each observation is a row, and each type of observational unit is a table (Wickham, H., 2014, Journal of Statistical Software, Articles, 59, 1)." Already included in glossary: DM Data Management. provenance A description of the inputs and processes which have been used to generate a particular result or data product. SDQA Science Data Quality Assurance. SQuaSH Science Quality Analysis Harness; SQR-009; https://squash.lsst.codes .
            Hide
            mgraham Melissa Graham added a comment -

            The slightly modified list of glossary terms should be reviewed by a DMTN-085 author to ensure none of the minor modifications have compromised the definitions.

            Show
            mgraham Melissa Graham added a comment - The slightly modified list of glossary terms should be reviewed by a DMTN-085 author to ensure none of the minor modifications have compromised the definitions.
            Hide
            tjenness Tim Jenness added a comment -
            Show
            tjenness Tim Jenness added a comment - The master glossary file is now at https://github.com/lsst/lsst-texmf/blob/master/etc/glossarydefs.csv
            Hide
            mgraham Melissa Graham added a comment -

            Thanks Tim Jenness, and I see it has acronyms as well, so not all terms need a definition to be included. Once I iterate with the DMTN-085 folks, I'll integrate the new terms via PR.

            Show
            mgraham Melissa Graham added a comment - Thanks Tim Jenness , and I see it has acronyms as well, so not all terms need a definition to be included. Once I iterate with the DMTN-085 folks, I'll integrate the new terms via PR.
            Hide
            ebellm Eric Bellm added a comment -

            A few comments:

            I think the example for aggregate metrics is not quite right, should be "the overall photometric repeatability for a particular tract given the repeatability of multiple individual stars in the tract."

            aggregation: the process of reducing multiple input values to a single output. For example, we can sum or average a metric computed over patches to produce an aggregate metric at tract level."

            typos:

            (in GPFS: "aslo" -> also)
            releaseable -> releasable
            in metric: "metric as a measurable QUANTITY which..."
            model metric: "coefficients"

            Show
            ebellm Eric Bellm added a comment - A few comments: I think the example for aggregate metrics is not quite right, should be "the overall photometric repeatability for a particular tract given the repeatability of multiple individual stars in the tract." aggregation: the process of reducing multiple input values to a single output. For example, we can sum or average a metric computed over patches to produce an aggregate metric at tract level." typos: (in GPFS: "aslo" -> also) releaseable -> releasable in metric: "metric as a measurable QUANTITY which..." model metric: "coefficients"
            Hide
            mgraham Melissa Graham added a comment -

            aggregate metric – updated to say "...the overall photometric repeatability for a particular tract given given the repeatability of multiple individual stars in the tract."

            aggregation – updated to say "The process of reducing multiple input values to a single output, e.g., a metric value, computed from a collection of input values."

            Unthinkingly, in my last comment I pasted the original definitions, straight from DMTN-085, into the "Incorporated with little to no modification" section, instead of the lightly amended versions that appear in the QA_Definitions_061319.numbers file. So all those typos are already fixed. Sorry to have made you re-review.

            Show
            mgraham Melissa Graham added a comment - aggregate metric – updated to say "...the overall photometric repeatability for a particular tract given given the repeatability of multiple individual stars in the tract." aggregation – updated to say "The process of reducing multiple input values to a single output, e.g., a metric value, computed from a collection of input values." Unthinkingly, in my last comment I pasted the original definitions, straight from DMTN-085, into the "Incorporated with little to no modification" section, instead of the lightly amended versions that appear in the QA_Definitions_061319.numbers file. So all those typos are already fixed. Sorry to have made you re-review.
            Hide
            mgraham Melissa Graham added a comment -

            New terms added to the end of file glossarydefs.csv in lsst-texmf, pull request initiated with reviewer Wil O'Mullane (because that was who GitHub suggested as reviewer). 

            Once that PR is complete, I will close this ticket.

            Show
            mgraham Melissa Graham added a comment - New terms added to the end of file glossarydefs.csv in lsst-texmf, pull request initiated with reviewer Wil O'Mullane (because that was who GitHub suggested as reviewer).  Once that PR is complete, I will close this ticket.
            Hide
            mgraham Melissa Graham added a comment -

            Wil O'Mullane has merged and closed tickets/DM-20011 for lsst-texmf (issue #173; thank you!), and I'm setting this ticket to done.

            Show
            mgraham Melissa Graham added a comment - Wil O'Mullane has merged and closed tickets/ DM-20011 for lsst-texmf (issue #173; thank you!), and I'm setting this ticket to done.

              People

              • Assignee:
                mgraham Melissa Graham
                Reporter:
                lguy Leanne Guy
                Reviewers:
                Eric Bellm
                Watchers:
                Eric Bellm, Leanne Guy, Melissa Graham, Tim Jenness, Wil O'Mullane
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel