Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-18546

Enable fastparquet as a read option for ParquetTable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Story Points:
      8
    • Epic Link:
    • Team:
      Data Release Production

      Description

      As identified in DM-18353, there are some reading bugs with pyarrow that prevent "large" (don't know what this means exactly) parquet files from being read. Experimentation has shown that fastparquet can read these files successfully. This ticket is to implement an option to read a ParquetTable using fastparquet instead of pyarrow.

        Attachments

          Issue Links

            Activity

            Hide
            tmorton Tim Morton added a comment -

            Starting to work on this has led me down this current rabbit-hole:

            https://github.com/dask/fastparquet/issues/409

            I will take a look and see if I can do this relatively easily.

            Show
            tmorton Tim Morton added a comment - Starting to work on this has led me down this current rabbit-hole: https://github.com/dask/fastparquet/issues/409 I will take a look and see if I can do this relatively easily.
            Hide
            tmorton Tim Morton added a comment -

            OK, I have submitted the following PR to fastparquet, which should enable fastparquet to read our files...will look at that next.

            https://github.com/dask/fastparquet/pull/410

            Show
            tmorton Tim Morton added a comment - OK, I have submitted the following PR to fastparquet, which should enable fastparquet to read our files...will look at that next. https://github.com/dask/fastparquet/pull/410
            Hide
            tmorton Tim Morton added a comment -

            This issue has gone away since the recent update of the stack on lsst-dev, so marking this as won't fix for now... even though a good bit of work went into making it possible through the above-referenced PR to fastparquet.

            Show
            tmorton Tim Morton added a comment - This issue has gone away since the recent update of the stack on lsst-dev, so marking this as won't fix for now... even though a good bit of work went into making it possible through the above-referenced PR to fastparquet.
            Hide
            tmorton Tim Morton added a comment -

            Changing status to "done" to reflect the work that went into this and the related necessary development contributions to fastparquet.

            Show
            tmorton Tim Morton added a comment - Changing status to "done" to reflect the work that went into this and the related necessary development contributions to fastparquet.

              People

              • Assignee:
                tmorton Tim Morton
                Reporter:
                tmorton Tim Morton
                Watchers:
                John Swinbank, Tim Morton, Yusra AlSayyad
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: