Details
-
Type:
Story
-
Status: Done
-
Resolution: Done
-
Fix Version/s: None
-
Component/s: dax_obscore
-
Labels:
-
Story Points:1
-
Sprint:DB_F22_6
-
Team:Data Access and Database
-
Urgent?:No
Description
dax_obscore CSV output produces empty value for NULL/None values. MySQL needs \N token on import from CVS for NULL, so currently we have to post-process CVS and substitute empty values with \N. Would be nice to output expected value without post-processing.
Attachments
Activity
Field | Original Value | New Value |
---|---|---|
Summary | Add option for CVS output of NULL values to dax_obscore | Add option for CSV output of NULL values to dax_obscore |
Status | To Do [ 10001 ] | In Progress [ 3 ] |
Reviewers | Fritz Mueller [ fritzm ] | |
Status | In Progress [ 3 ] | In Review [ 10004 ] |
Sprint | DB_F22_6 [ 1172 ] |
Status | In Review [ 10004 ] | Reviewed [ 10101 ] |
Resolution | Done [ 10000 ] | |
Status | Reviewed [ 10101 ] | Done [ 10002 ] |
We use pyarrow for output, its CSV write has a some options for output format, but it does not support specifying NULL value. Actually its C++ code has that option: https://github.com/apache/arrow/blob/master/cpp/src/arrow/csv/options.h#L201, but Python wrapper for that class does not implement it. I commented on Arrow Jira about it (https://issues.apache.org/jira/browse/ARROW-16893) maybe one day it gets fixed. For now we just need a workaround.