Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-13179

Handling nan values in verify JSON outputs

    Details

    • Type: Story
    • Status: To Do
    • Resolution: Unresolved
    • Fix Version/s: None
    • Component/s: squash, verify
    • Labels:
      None

      Description

      Jonathan Sick Simon Krughoff when testing dispatch_verify.py with the Flask based SQuaSH RESTful API I noticed some nan values in the verify output measurements and blobs.

      They are present for example in the test data I have in the squash-rest-api repository and the problem can be reproduced using the example notebook at:

      https://github.com/lsst-sqre/squash-rest-api/blob/master/tests/test_api.ipynb

      According to the JSON RFC4627 https://tools.ietf.org/html/rfc4627#section-2.4

      "Numeric values that cannot be represented as sequences of digits
         (such as Infinity and NaN) are not permitted"
      

      and in fact SQLAlchemy cannot handle the nan in data blobs that get stored in the MySQL JSON field. Here's an exemple of error message in that case:

       
      sqlalchemy.exc.InvalidRequestError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (pymysql.err.InternalError) (3140, 'Invalid JSON text: "Invalid value." at position 388 in value for column \'blob.data\'.') [SQL: 'INSERT INTO `blob` (identifier, name, data, job_id) VALUES (%(identifier)s, %(name)s, %(data)s, %(job_id)s)'] [parameters: {'identifier': 'aa2d119283944b99a56841c48a6d5967', 'name': 'validate_drp.AF3_minimum', 'data': '{"annulus": {"description": "Inner and outer radii of selection annulus.", "unit": "arcmin", "value": [199.0, 201.0], "label": "annulus radii"}, "rms ... (464 characters truncated) ... e": [17.0, 21.5], "label": null}, "D": {"description": "Radial distance of annulus (arcmin)", "unit": "arcmin", "value": 200.0, "label": "Distance"}}', 'job_id': 2}] (Background on this error at: http://sqlalche.me/e/2j85)
      
      

      Currently I am handling that in SQuaSH, which looks ugly:

      https://github.com/lsst-sqre/squash-rest-api/blob/master/app/models.py#L364

      and

      https://github.com/lsst-sqre/squash-rest-api/blob/master/app/models.py#L413

      Could we solve this up front to make sure that we don't have any nan or Infinity values in the verify outputs?

      What's your suggestion for that?

      Perhaps if nan gets replaced by null in the verify outputs then I can modify the value field (float) in the measurement table to accept a MySQL null values. And we should not have any problem with data blobs that have null values since they are stored in a MySQL JSON field.

        Attachments

          Activity

          Hide
          jsick Jonathan Sick added a comment -

          Yeah, I think we can coerce to None/null on the lsst.verify side, perhaps as a step we do after building the JSON data structure.

          Show
          jsick Jonathan Sick added a comment - Yeah, I think we can coerce to None / null on the lsst.verify side, perhaps as a step we do after building the JSON data structure.
          Hide
          krzys Krzysztof Findeisen added a comment -

          By the same token, the lsst.verify persistence code should handle infinities. We've already run into one case where an insufficiently defensive metric calculation could give +/-Inf.

          Show
          krzys Krzysztof Findeisen added a comment - By the same token, the lsst.verify persistence code should handle infinities. We've already run into one case where an insufficiently defensive metric calculation could give +/-Inf .
          Hide
          jsick Jonathan Sick added a comment -

          I need to refresh myself, but I think that's an issue lsst.verify's current use of JSON. Again, we could do a pre-serialization step and apply some alternative serialization of infinities.

          Show
          jsick Jonathan Sick added a comment - I need to refresh myself, but I think that's an issue lsst.verify's current use of JSON. Again, we could do a pre-serialization step and apply some alternative serialization of infinities.

            People

            • Assignee:
              afausti Angelo Fausti
              Reporter:
              afausti Angelo Fausti
              Watchers:
              Angelo Fausti, Jonathan Sick, Krzysztof Findeisen, Simon Krughoff
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:

                Summary Panel