Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-35435

Add option for CSV output of NULL values to dax_obscore

    XMLWordPrintable

Details

    • Story
    • Status: Done
    • Resolution: Done
    • None
    • dax_obscore
    • 1
    • DB_F22_6
    • Data Access and Database
    • No

    Description

      dax_obscore CSV output produces empty value for NULL/None values. MySQL needs \N token on import from CVS for NULL, so currently we have to post-process CVS and substitute empty values with \N. Would be nice to output expected value without post-processing.

      Attachments

        Activity

          No builds found.
          salnikov Andy Salnikov created issue -
          salnikov Andy Salnikov made changes -
          Field Original Value New Value
          Summary Add option for CVS output of NULL values to dax_obscore Add option for CSV output of NULL values to dax_obscore

          We use pyarrow for output, its CSV write has a some options for output format, but it does not support specifying NULL value. Actually its C++ code has that option: https://github.com/apache/arrow/blob/master/cpp/src/arrow/csv/options.h#L201, but Python wrapper for that class does not implement it. I commented on Arrow Jira about it (https://issues.apache.org/jira/browse/ARROW-16893) maybe one day it gets fixed. For now we just need a workaround.

          salnikov Andy Salnikov added a comment - We use pyarrow for output, its CSV write has a some options for output format, but it does not support specifying NULL value. Actually its C++ code has that option: https://github.com/apache/arrow/blob/master/cpp/src/arrow/csv/options.h#L201 , but Python wrapper for that class does not implement it. I commented on Arrow Jira about it ( https://issues.apache.org/jira/browse/ARROW-16893 ) maybe one day it gets fixed. For now we just need a workaround.
          salnikov Andy Salnikov made changes -
          Status To Do [ 10001 ] In Progress [ 3 ]

          fritzm, I have a fix which is ready for review, though it's a bit more code than your one-line sed script
          PR: https://github.com/lsst-dm/dax_obscore/pull/5

          salnikov Andy Salnikov added a comment - fritzm , I have a fix which is ready for review, though it's a bit more code than your one-line sed script PR: https://github.com/lsst-dm/dax_obscore/pull/5
          salnikov Andy Salnikov made changes -
          Reviewers Fritz Mueller [ fritzm ]
          Status In Progress [ 3 ] In Review [ 10004 ]
          fritzm Fritz Mueller made changes -
          Sprint DB_F22_6 [ 1172 ]

          LGTM; thanks!

          fritzm Fritz Mueller added a comment - LGTM; thanks!
          fritzm Fritz Mueller made changes -
          Status In Review [ 10004 ] Reviewed [ 10101 ]
          salnikov Andy Salnikov made changes -
          Resolution Done [ 10000 ]
          Status Reviewed [ 10101 ] Done [ 10002 ]

          People

            salnikov Andy Salnikov
            salnikov Andy Salnikov
            Fritz Mueller
            Andy Salnikov, Fritz Mueller
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Jenkins

                No builds found.