Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-7043

Update SQuaSH database model and JSON API with concepts from validate_drp measurement API

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: QA
    • Labels:

      Description

      In DM-6629, a new measurement API was introduced into validate_drp that established a JSON data model for metrics, specifications of metrics, measurements, and general blob datasets. The intent of that work is to enable rich plots in the SQUASH dashboard, with access to data behind measurements. The new data model also clarifies the subtleties of metric specifications (filter dependence, and dependence on other specifications). This ticket will incorporate validate_drp’s new data model into the SQUASH database and API.

      Also related to DM-7041, which will update the post-qa tool that submits validate_drp json to the SQUASH API.

        Attachments

          Issue Links

            Activity

            No builds found.
            jsick Jonathan Sick created issue -
            jsick Jonathan Sick made changes -
            Field Original Value New Value
            Epic Link DM-6196 [ 24712 ]
            jsick Jonathan Sick made changes -
            Link This issue is triggered by DM-6629 [ DM-6629 ]
            jsick Jonathan Sick made changes -
            Link This issue is triggered by DM-7041 [ DM-7041 ]
            jsick Jonathan Sick made changes -
            Link This issue is triggered by DM-7041 [ DM-7041 ]
            jsick Jonathan Sick made changes -
            Link This issue has to be started together with DM-7041 [ DM-7041 ]
            afausti Angelo Fausti made changes -
            Assignee Jonathan Sick [ jsick ] Angelo Fausti [ afausti ]
            frossie Frossie Economou made changes -
            Epic Link DM-6196 [ 24712 ] DM-5504 [ 23337 ]
            afausti Angelo Fausti made changes -
            Epic Link DM-5504 [ 23337 ] DM-8478 [ 28110 ]
            Hide
            afausti Angelo Fausti added a comment - - edited

            Jonathan Sick currently the job JSON has a structure like this

             
             "measurements": [
                            {
                                "metric": "AM1",
                                "value": 7.15136555363356
                            },
                            {
                                "metric": "AM2",
                                "value": 6.80681963522785
                            },
                            {
                                "metric": "PA1",
                                "value": 14.9064428565398
                            }
                        ]
             
            
            

            I imagine replacing the scalar measurement by the new measurement JSON:

            data['measurements'][0].keys()
             
            [u'blobs',
             u'parameters',
             u'metric',
             u'value',
             u'extras',
             u'spec_name',
             u'filter_name',
             u'identifier',
             u'unit']
            
            

            For measurements that are done in different filters or depend on those we can have multiple measurements for
            the same metric. That means we should have a list of measurements for each metric, e.g

             
            "measurements": [
                            {
                                "metric": "AM1",
                    --->       "measurement": []   <---
                            },
                            {
                                "metric": "AM2",
                                "measurement": []
                            },
                            {
                                "metric": "PA1",
                                "measurement": []
                            }
                        ]
             
            
            

            For the datasets produced for each job we also need the blob JSON, example:

             
            data['blobs'][0].keys()
             
            [u'identifier', u'data', u'name']
             
            
            

             
            {
                        "ci_id": "2",
                        "ci_name": "demo",
                        "ci_dataset": "cfht",
                        "ci_label": "centos-7",
                        "date": "2016-06-02T05:21:57.298935Z",
                        "ci_url": "https://ci.lsst.codes/job/validate_drp/dataset=cfht,label=centos-7/2/",
                        "status": 0,
               --->     "blobs": {}     <---
                        "measurements": [
                            {
                                "metric": "AM1",
                                "measurement": [ ] 
                            },
                            {
                                "metric": "AM2",
                                "measurement": [ ]
                            },
                            {
                                "metric": "PA1",
                                "value": [  ]
                            }
                        ],
             
            
            

            if it sounds reasonable I can mock that to continue development.

            The important thing for me now is to be able to retrieve the measurement JSON from the SQUASH API given the ci_id,
            ci_dataset and the metric and then call an URL to load the corresponding bokeh app

            https://angelo-squash-bokeh.lsst.codes/photometry?metric=PA1&ci_dataset=cfht&ci_id=1
            
            

            Show
            afausti Angelo Fausti added a comment - - edited Jonathan Sick currently the job JSON has a structure like this   "measurements" : [ { "metric" : "AM1" , "value" : 7.15136555363356 }, { "metric" : "AM2" , "value" : 6.80681963522785 }, { "metric" : "PA1" , "value" : 14.9064428565398 } ]   I imagine replacing the scalar measurement by the new measurement JSON: data[ 'measurements' ][ 0 ].keys()   [u 'blobs' , u 'parameters' , u 'metric' , u 'value' , u 'extras' , u 'spec_name' , u 'filter_name' , u 'identifier' , u 'unit' ] For measurements that are done in different filters or depend on those we can have multiple measurements for the same metric. That means we should have a list of measurements for each metric, e.g   "measurements" : [ { "metric" : "AM1" , ---> "measurement" : [] <--- }, { "metric" : "AM2" , "measurement" : [] }, { "metric" : "PA1" , "measurement" : [] } ]   For the datasets produced for each job we also need the blob JSON, example:   data[ 'blobs' ][ 0 ].keys()   [u 'identifier' , u 'data' , u 'name' ]     { "ci_id" : "2" , "ci_name" : "demo" , "ci_dataset" : "cfht" , "ci_label" : "centos-7" , "date" : "2016-06-02T05:21:57.298935Z" , "ci_url" : "https://ci.lsst.codes/job/validate_drp/dataset=cfht,label=centos-7/2/" , "status" : 0 , ---> "blobs" : {} <--- "measurements" : [ { "metric" : "AM1" , "measurement" : [ ] }, { "metric" : "AM2" , "measurement" : [ ] }, { "metric" : "PA1" , "value" : [ ] } ],   if it sounds reasonable I can mock that to continue development. The important thing for me now is to be able to retrieve the measurement JSON from the SQUASH API given the ci_id, ci_dataset and the metric and then call an URL to load the corresponding bokeh app https: //angelo-squash-bokeh.lsst.codes/photometry?metric=PA1&ci_dataset=cfht&ci_id=1
            Hide
            jsick Jonathan Sick added a comment -

            in the final JSON sample for the full REST json, blobs is an object/dict. Do you want the keys of this dict to be the blob identifier? this would make it easy to look up from a measurement.

            If so, the actions that post-qa needs to do to shim it's native format to the SQUASH format is:

            1. Convert the blobs array to an object keyed by identifier.
            2. Convert the measurements array to an an array of objects with fields: 1) metric name, and 2) array of corresponding measurements.

            Thinking of the last one, it may make more sense to simply make measurements an object keyed by metric names,

            {
              "measurements": {
                {"AM1": [], <- array of measurement objects
                ...}
              }
            }
            

            Show
            jsick Jonathan Sick added a comment - in the final JSON sample for the full REST json, blobs is an object/dict. Do you want the keys of this dict to be the blob identifier ? this would make it easy to look up from a measurement. If so, the actions that post-qa needs to do to shim it's native format to the SQUASH format is: Convert the blobs array to an object keyed by identifier . Convert the measurements array to an an array of objects with fields: 1) metric name, and 2) array of corresponding measurements. Thinking of the last one, it may make more sense to simply make measurements an object keyed by metric names, { "measurements": { {"AM1": [], <- array of measurement objects ...} } }
            Hide
            afausti Angelo Fausti added a comment - - edited

            I agree having blobs object keyed by identifier will make it easier to look up. Also, making measurements object keyed by metric name is better.
            Looks good, thanks for the suggestions.

            NOTE: the above conversation predates what is currently implemented. The current proposal for the model and serializes changes is slightly different.

            Show
            afausti Angelo Fausti added a comment - - edited I agree having blobs object keyed by identifier will make it easier to look up. Also, making measurements object keyed by metric name is better. Looks good, thanks for the suggestions. NOTE: the above conversation predates what is currently implemented. The current proposal for the model and serializes changes is slightly different.
            afausti Angelo Fausti made changes -
            Status To Do [ 10001 ] In Progress [ 3 ]
            afausti Angelo Fausti made changes -
            Link This issue is parent task of DM-8414 [ DM-8414 ]
            Hide
            afausti Angelo Fausti added a comment - - edited

            New schema proposed for the SQuaSH database, see corresponding changes in the models and serializers.

            • new field data in the Job table to store datasets produced by the job (json blob)
            • new field metadata in the Measurement table (json blob)
            • renamed condition to operator in the Metric table to conform with validate_base metric definition
            • renamed units fields to unit in the Metric table
            • accept empty string in unit field (some metrics may not have unit)
            • new fields in the Metric table: parameters, specs and reference to conform with validade_base metric definition (json blobs)

            Show
            afausti Angelo Fausti added a comment - - edited New schema proposed for the SQuaSH database, see corresponding changes in the models and serializers. new field data in the Job table to store datasets produced by the job (json blob) new field metadata in the Measurement table (json blob) renamed condition to operator in the Metric table to conform with validate_base metric definition renamed units fields to unit in the Metric table accept empty string in unit field (some metrics may not have unit) new fields in the Metric table: parameters, specs and reference to conform with validade_base metric definition (json blobs)
            afausti Angelo Fausti made changes -
            Link This issue relates to DM-8976 [ DM-8976 ]
            afausti Angelo Fausti made changes -
            Attachment QA-0.png [ 29035 ]
            Hide
            afausti Angelo Fausti added a comment - - edited

            Here is how the API endpoints look like after changing the model and serializers.

            Notice that measurements metadata is not present in the Metrics App endpoint. We kept just the fields required for this app minimizing the amount of data returned by this endpoint.

            Show
            afausti Angelo Fausti added a comment - - edited Here is how the API endpoints look like after changing the model and serializers. Notice that measurements metadata is not present in the Metrics App endpoint. We kept just the fields required for this app minimizing the amount of data returned by this endpoint.
            afausti Angelo Fausti made changes -
            afausti Angelo Fausti made changes -
            afausti Angelo Fausti made changes -
            Comment [ ^Screen Shot 2017-01-18 at 1.51.41 PM.png ]
            afausti Angelo Fausti made changes -
            Show
            afausti Angelo Fausti added a comment - See also PR https://github.com/lsst-sqre/qa-dashboard/pull/31
            afausti Angelo Fausti made changes -
            Reviewers Jonathan Sick [ jsick ]
            Status In Progress [ 3 ] In Review [ 10004 ]
            afausti Angelo Fausti made changes -
            Story Points 5
            afausti Angelo Fausti made changes -
            Link This issue relates to DM-9034 [ DM-9034 ]
            Hide
            jsick Jonathan Sick added a comment -

            This looks fine. Some comments are in the PR. The main need I see is documenting the schema of these JSON fields (maybe in SQR-008) so that it's clear what data is in them (there's a slight difference between the json schema made by validate_base and what finally ends up in these JSON fields and it's good to make it clear).

            Show
            jsick Jonathan Sick added a comment - This looks fine. Some comments are in the PR. The main need I see is documenting the schema of these JSON fields (maybe in SQR-008) so that it's clear what data is in them (there's a slight difference between the json schema made by validate_base and what finally ends up in these JSON fields and it's good to make it clear).
            jsick Jonathan Sick made changes -
            Status In Review [ 10004 ] Reviewed [ 10101 ]
            Hide
            afausti Angelo Fausti added a comment -

            Thanks, yes I will document that properly in the SQuaSH technote, basically the mapping of validade_base Job to SQUASH API is the following

            1) the content of blobs is ingested into the Job.data field (make sense to change the field name to blobs)
            2) the content of measurements is ingested in the Measurement.metadata field, with the exception of the metric specification which is ingested in the Metric table for normalization
            3) the measurement value is kept in a separate field in the Measurement table for convenience
            4) the metrics specification has unit, description, operator, specs and reference which are all individual fields in the Metric table.

            Show
            afausti Angelo Fausti added a comment - Thanks, yes I will document that properly in the SQuaSH technote, basically the mapping of validade_base Job to SQUASH API is the following 1) the content of blobs is ingested into the Job.data field (make sense to change the field name to blobs ) 2) the content of measurements is ingested in the Measurement.metadata field, with the exception of the metric specification which is ingested in the Metric table for normalization 3) the measurement value is kept in a separate field in the Measurement table for convenience 4) the metrics specification has unit , description , operator , specs and reference which are all individual fields in the Metric table.
            Hide
            afausti Angelo Fausti added a comment -

            Applied PR comments

            Show
            afausti Angelo Fausti added a comment - Applied PR comments
            afausti Angelo Fausti made changes -
            Resolution Done [ 10000 ]
            Status Reviewed [ 10101 ] Done [ 10002 ]

              People

              Assignee:
              afausti Angelo Fausti
              Reporter:
              jsick Jonathan Sick
              Reviewers:
              Jonathan Sick
              Watchers:
              Angelo Fausti, Jonathan Sick
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.