# Handling nan values in verify JSON outputs

XMLWordPrintable

#### Details

• Type: Story
• Status: Won't Fix
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
None
• Story Points:
1.4
• Team:
SQuaRE

#### Description

Jonathan Sick Simon Krughoff when testing dispatch_verify.py with the Flask based SQuaSH RESTful API I noticed some nan values in the verify output measurements and blobs.

They are present for example in the test data I have in the squash-rest-api repository and the problem can be reproduced using the example notebook at:

https://github.com/lsst-sqre/squash-rest-api/blob/master/tests/test_api.ipynb

According to the JSON RFC4627 https://tools.ietf.org/html/rfc4627#section-2.4

 "Numeric values that cannot be represented as sequences of digits  (such as Infinity and NaN) are not permitted" 

and in fact SQLAlchemy cannot handle the nan in data blobs that get stored in the MySQL JSON field. Here's an exemple of error message in that case:

   sqlalchemy.exc.InvalidRequestError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (pymysql.err.InternalError) (3140, 'Invalid JSON text: "Invalid value." at position 388 in value for column \'blob.data\'.') [SQL: 'INSERT INTO blob (identifier, name, data, job_id) VALUES (%(identifier)s, %(name)s, %(data)s, %(job_id)s)'] [parameters: {'identifier': 'aa2d119283944b99a56841c48a6d5967', 'name': 'validate_drp.AF3_minimum', 'data': '{"annulus": {"description": "Inner and outer radii of selection annulus.", "unit": "arcmin", "value": [199.0, 201.0], "label": "annulus radii"}, "rms ... (464 characters truncated) ... e": [17.0, 21.5], "label": null}, "D": {"description": "Radial distance of annulus (arcmin)", "unit": "arcmin", "value": 200.0, "label": "Distance"}}', 'job_id': 2}] (Background on this error at: http://sqlalche.me/e/2j85)  

Currently I am handling that in SQuaSH, which looks ugly:

https://github.com/lsst-sqre/squash-rest-api/blob/master/app/models.py#L364

and

https://github.com/lsst-sqre/squash-rest-api/blob/master/app/models.py#L413

Could we solve this up front to make sure that we don't have any nan or Infinity values in the verify outputs?

Perhaps if nan gets replaced by null in the verify outputs then I can modify the value field (float) in the measurement table to accept a MySQL null values. And we should not have any problem with data blobs that have null values since they are stored in a MySQL JSON field.

#### Activity

Hide
Jonathan Sick added a comment -

Yeah, I think we can coerce to None/null on the lsst.verify side, perhaps as a step we do after building the JSON data structure.

Show
Jonathan Sick added a comment - Yeah, I think we can coerce to None / null on the lsst.verify side, perhaps as a step we do after building the JSON data structure.
Hide
Krzysztof Findeisen added a comment -

By the same token, the lsst.verify persistence code should handle infinities. We've already run into one case where an insufficiently defensive metric calculation could give +/-Inf.

Show
Krzysztof Findeisen added a comment - By the same token, the lsst.verify persistence code should handle infinities. We've already run into one case where an insufficiently defensive metric calculation could give +/-Inf .
Hide
Jonathan Sick added a comment -

I need to refresh myself, but I think that's an issue lsst.verify's current use of JSON. Again, we could do a pre-serialization step and apply some alternative serialization of infinities.

Show
Jonathan Sick added a comment - I need to refresh myself, but I think that's an issue lsst.verify's current use of JSON. Again, we could do a pre-serialization step and apply some alternative serialization of infinities.
Hide
Angelo Fausti added a comment -

This is now fixed via DM-31131. We decided to preserve the NaN values and represent them as null in the JSON job document. In the SQuaSH database they are stored as SQLAlchemy NULL, and when writing to InfluxDB the recommended approach is to drop the NaN values.

Show
Angelo Fausti added a comment - This is now fixed via DM-31131 . We decided to preserve the NaN values and represent them as null in the JSON job document. In the SQuaSH database they are stored as SQLAlchemy NULL, and when writing to InfluxDB the recommended approach is to drop the NaN values.

#### People

Assignee:
Angelo Fausti
Reporter:
Angelo Fausti
Watchers:
Angelo Fausti, Jonathan Sick, Krzysztof Findeisen, Simon Krughoff