Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-35435

Add option for CSV output of NULL values to dax_obscore

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: dax_obscore
    • Labels:
    • Story Points:
      1
    • Sprint:
      DB_F22_6
    • Team:
      Data Access and Database
    • Urgent?:
      No

      Description

      dax_obscore CSV output produces empty value for NULL/None values. MySQL needs \N token on import from CVS for NULL, so currently we have to post-process CVS and substitute empty values with \N. Would be nice to output expected value without post-processing.

        Attachments

          Activity

          Hide
          salnikov Andy Salnikov added a comment -

          We use pyarrow for output, its CSV write has a some options for output format, but it does not support specifying NULL value. Actually its C++ code has that option: https://github.com/apache/arrow/blob/master/cpp/src/arrow/csv/options.h#L201, but Python wrapper for that class does not implement it. I commented on Arrow Jira about it (https://issues.apache.org/jira/browse/ARROW-16893) maybe one day it gets fixed. For now we just need a workaround.

          Show
          salnikov Andy Salnikov added a comment - We use pyarrow for output, its CSV write has a some options for output format, but it does not support specifying NULL value. Actually its C++ code has that option: https://github.com/apache/arrow/blob/master/cpp/src/arrow/csv/options.h#L201 , but Python wrapper for that class does not implement it. I commented on Arrow Jira about it ( https://issues.apache.org/jira/browse/ARROW-16893 ) maybe one day it gets fixed. For now we just need a workaround.
          Hide
          salnikov Andy Salnikov added a comment -

          Fritz Mueller, I have a fix which is ready for review, though it's a bit more code than your one-line sed script
          PR: https://github.com/lsst-dm/dax_obscore/pull/5

          Show
          salnikov Andy Salnikov added a comment - Fritz Mueller , I have a fix which is ready for review, though it's a bit more code than your one-line sed script PR: https://github.com/lsst-dm/dax_obscore/pull/5
          Hide
          fritzm Fritz Mueller added a comment -

          LGTM; thanks!

          Show
          fritzm Fritz Mueller added a comment - LGTM; thanks!

            People

            Assignee:
            salnikov Andy Salnikov
            Reporter:
            salnikov Andy Salnikov
            Reviewers:
            Fritz Mueller
            Watchers:
            Andy Salnikov, Fritz Mueller
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                Jenkins

                No builds found.