Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-7043

Update SQuaSH database model and JSON API with concepts from validate_drp measurement API

    XMLWordPrintable

Details

    • Story
    • Status: Done
    • Resolution: Done
    • None
    • QA

    Description

      In DM-6629, a new measurement API was introduced into validate_drp that established a JSON data model for metrics, specifications of metrics, measurements, and general blob datasets. The intent of that work is to enable rich plots in the SQUASH dashboard, with access to data behind measurements. The new data model also clarifies the subtleties of metric specifications (filter dependence, and dependence on other specifications). This ticket will incorporate validate_drp’s new data model into the SQUASH database and API.

      Also related to DM-7041, which will update the post-qa tool that submits validate_drp json to the SQUASH API.

      Attachments

        Issue Links

          Activity

            afausti Angelo Fausti added a comment - - edited

            jsick currently the job JSON has a structure like this

             
             "measurements": [
                            {
                                "metric": "AM1",
                                "value": 7.15136555363356
                            },
                            {
                                "metric": "AM2",
                                "value": 6.80681963522785
                            },
                            {
                                "metric": "PA1",
                                "value": 14.9064428565398
                            }
                        ]
             
            
            

            I imagine replacing the scalar measurement by the new measurement JSON:

            data['measurements'][0].keys()
             
            [u'blobs',
             u'parameters',
             u'metric',
             u'value',
             u'extras',
             u'spec_name',
             u'filter_name',
             u'identifier',
             u'unit']
            
            

            For measurements that are done in different filters or depend on those we can have multiple measurements for
            the same metric. That means we should have a list of measurements for each metric, e.g

             
            "measurements": [
                            {
                                "metric": "AM1",
                    --->       "measurement": []   <---
                            },
                            {
                                "metric": "AM2",
                                "measurement": []
                            },
                            {
                                "metric": "PA1",
                                "measurement": []
                            }
                        ]
             
            
            

            For the datasets produced for each job we also need the blob JSON, example:

             
            data['blobs'][0].keys()
             
            [u'identifier', u'data', u'name']
             
            
            

             
            {
                        "ci_id": "2",
                        "ci_name": "demo",
                        "ci_dataset": "cfht",
                        "ci_label": "centos-7",
                        "date": "2016-06-02T05:21:57.298935Z",
                        "ci_url": "https://ci.lsst.codes/job/validate_drp/dataset=cfht,label=centos-7/2/",
                        "status": 0,
               --->     "blobs": {}     <---
                        "measurements": [
                            {
                                "metric": "AM1",
                                "measurement": [ ] 
                            },
                            {
                                "metric": "AM2",
                                "measurement": [ ]
                            },
                            {
                                "metric": "PA1",
                                "value": [  ]
                            }
                        ],
             
            
            

            if it sounds reasonable I can mock that to continue development.

            The important thing for me now is to be able to retrieve the measurement JSON from the SQUASH API given the ci_id,
            ci_dataset and the metric and then call an URL to load the corresponding bokeh app

            https://angelo-squash-bokeh.lsst.codes/photometry?metric=PA1&ci_dataset=cfht&ci_id=1
            
            

            afausti Angelo Fausti added a comment - - edited jsick currently the job JSON has a structure like this   "measurements" : [ { "metric" : "AM1" , "value" : 7.15136555363356 }, { "metric" : "AM2" , "value" : 6.80681963522785 }, { "metric" : "PA1" , "value" : 14.9064428565398 } ]   I imagine replacing the scalar measurement by the new measurement JSON: data[ 'measurements' ][ 0 ].keys()   [u 'blobs' , u 'parameters' , u 'metric' , u 'value' , u 'extras' , u 'spec_name' , u 'filter_name' , u 'identifier' , u 'unit' ] For measurements that are done in different filters or depend on those we can have multiple measurements for the same metric. That means we should have a list of measurements for each metric, e.g   "measurements" : [ { "metric" : "AM1" , ---> "measurement" : [] <--- }, { "metric" : "AM2" , "measurement" : [] }, { "metric" : "PA1" , "measurement" : [] } ]   For the datasets produced for each job we also need the blob JSON, example:   data[ 'blobs' ][ 0 ].keys()   [u 'identifier' , u 'data' , u 'name' ]     { "ci_id" : "2" , "ci_name" : "demo" , "ci_dataset" : "cfht" , "ci_label" : "centos-7" , "date" : "2016-06-02T05:21:57.298935Z" , "ci_url" : "https://ci.lsst.codes/job/validate_drp/dataset=cfht,label=centos-7/2/" , "status" : 0 , ---> "blobs" : {} <--- "measurements" : [ { "metric" : "AM1" , "measurement" : [ ] }, { "metric" : "AM2" , "measurement" : [ ] }, { "metric" : "PA1" , "value" : [ ] } ],   if it sounds reasonable I can mock that to continue development. The important thing for me now is to be able to retrieve the measurement JSON from the SQUASH API given the ci_id, ci_dataset and the metric and then call an URL to load the corresponding bokeh app https: //angelo-squash-bokeh.lsst.codes/photometry?metric=PA1&ci_dataset=cfht&ci_id=1

            in the final JSON sample for the full REST json, blobs is an object/dict. Do you want the keys of this dict to be the blob identifier? this would make it easy to look up from a measurement.

            If so, the actions that post-qa needs to do to shim it's native format to the SQUASH format is:

            1. Convert the blobs array to an object keyed by identifier.
            2. Convert the measurements array to an an array of objects with fields: 1) metric name, and 2) array of corresponding measurements.

            Thinking of the last one, it may make more sense to simply make measurements an object keyed by metric names,

            {
              "measurements": {
                {"AM1": [], <- array of measurement objects
                ...}
              }
            }
            

            jsick Jonathan Sick added a comment - in the final JSON sample for the full REST json, blobs is an object/dict. Do you want the keys of this dict to be the blob identifier ? this would make it easy to look up from a measurement. If so, the actions that post-qa needs to do to shim it's native format to the SQUASH format is: Convert the blobs array to an object keyed by identifier . Convert the measurements array to an an array of objects with fields: 1) metric name, and 2) array of corresponding measurements. Thinking of the last one, it may make more sense to simply make measurements an object keyed by metric names, { "measurements": { {"AM1": [], <- array of measurement objects ...} } }
            afausti Angelo Fausti added a comment - - edited

            I agree having blobs object keyed by identifier will make it easier to look up. Also, making measurements object keyed by metric name is better.
            Looks good, thanks for the suggestions.

            NOTE: the above conversation predates what is currently implemented. The current proposal for the model and serializes changes is slightly different.

            afausti Angelo Fausti added a comment - - edited I agree having blobs object keyed by identifier will make it easier to look up. Also, making measurements object keyed by metric name is better. Looks good, thanks for the suggestions. NOTE: the above conversation predates what is currently implemented. The current proposal for the model and serializes changes is slightly different.
            afausti Angelo Fausti added a comment - - edited

            New schema proposed for the SQuaSH database, see corresponding changes in the models and serializers.

            • new field data in the Job table to store datasets produced by the job (json blob)
            • new field metadata in the Measurement table (json blob)
            • renamed condition to operator in the Metric table to conform with validate_base metric definition
            • renamed units fields to unit in the Metric table
            • accept empty string in unit field (some metrics may not have unit)
            • new fields in the Metric table: parameters, specs and reference to conform with validade_base metric definition (json blobs)

            afausti Angelo Fausti added a comment - - edited New schema proposed for the SQuaSH database, see corresponding changes in the models and serializers. new field data in the Job table to store datasets produced by the job (json blob) new field metadata in the Measurement table (json blob) renamed condition to operator in the Metric table to conform with validate_base metric definition renamed units fields to unit in the Metric table accept empty string in unit field (some metrics may not have unit) new fields in the Metric table: parameters, specs and reference to conform with validade_base metric definition (json blobs)
            afausti Angelo Fausti added a comment - - edited

            Here is how the API endpoints look like after changing the model and serializers.

            Notice that measurements metadata is not present in the Metrics App endpoint. We kept just the fields required for this app minimizing the amount of data returned by this endpoint.

            afausti Angelo Fausti added a comment - - edited Here is how the API endpoints look like after changing the model and serializers. Notice that measurements metadata is not present in the Metrics App endpoint. We kept just the fields required for this app minimizing the amount of data returned by this endpoint.
            afausti Angelo Fausti added a comment - See also PR https://github.com/lsst-sqre/qa-dashboard/pull/31

            This looks fine. Some comments are in the PR. The main need I see is documenting the schema of these JSON fields (maybe in SQR-008) so that it's clear what data is in them (there's a slight difference between the json schema made by validate_base and what finally ends up in these JSON fields and it's good to make it clear).

            jsick Jonathan Sick added a comment - This looks fine. Some comments are in the PR. The main need I see is documenting the schema of these JSON fields (maybe in SQR-008) so that it's clear what data is in them (there's a slight difference between the json schema made by validate_base and what finally ends up in these JSON fields and it's good to make it clear).

            Thanks, yes I will document that properly in the SQuaSH technote, basically the mapping of validade_base Job to SQUASH API is the following

            1) the content of blobs is ingested into the Job.data field (make sense to change the field name to blobs)
            2) the content of measurements is ingested in the Measurement.metadata field, with the exception of the metric specification which is ingested in the Metric table for normalization
            3) the measurement value is kept in a separate field in the Measurement table for convenience
            4) the metrics specification has unit, description, operator, specs and reference which are all individual fields in the Metric table.

            afausti Angelo Fausti added a comment - Thanks, yes I will document that properly in the SQuaSH technote, basically the mapping of validade_base Job to SQUASH API is the following 1) the content of blobs is ingested into the Job.data field (make sense to change the field name to blobs ) 2) the content of measurements is ingested in the Measurement.metadata field, with the exception of the metric specification which is ingested in the Metric table for normalization 3) the measurement value is kept in a separate field in the Measurement table for convenience 4) the metrics specification has unit , description , operator , specs and reference which are all individual fields in the Metric table.

            Applied PR comments

            afausti Angelo Fausti added a comment - Applied PR comments

            People

              afausti Angelo Fausti
              jsick Jonathan Sick
              Jonathan Sick
              Angelo Fausti, Jonathan Sick
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Jenkins

                  No builds found.