Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-18546

Enable fastparquet as a read option for ParquetTable

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Story Points:
      8
    • Epic Link:
    • Team:
      Data Release Production

      Description

      As identified in DM-18353, there are some reading bugs with pyarrow that prevent "large" (don't know what this means exactly) parquet files from being read. Experimentation has shown that fastparquet can read these files successfully. This ticket is to implement an option to read a ParquetTable using fastparquet instead of pyarrow.

        Attachments

          Issue Links

            Activity

            Hide
            tmorton Tim Morton [X] (Inactive) added a comment -

            Starting to work on this has led me down this current rabbit-hole:

            https://github.com/dask/fastparquet/issues/409

            I will take a look and see if I can do this relatively easily.

            Show
            tmorton Tim Morton [X] (Inactive) added a comment - Starting to work on this has led me down this current rabbit-hole: https://github.com/dask/fastparquet/issues/409 I will take a look and see if I can do this relatively easily.
            Hide
            tmorton Tim Morton [X] (Inactive) added a comment -

            OK, I have submitted the following PR to fastparquet, which should enable fastparquet to read our files...will look at that next.

            https://github.com/dask/fastparquet/pull/410

            Show
            tmorton Tim Morton [X] (Inactive) added a comment - OK, I have submitted the following PR to fastparquet, which should enable fastparquet to read our files...will look at that next. https://github.com/dask/fastparquet/pull/410
            Hide
            tmorton Tim Morton [X] (Inactive) added a comment -

            This issue has gone away since the recent update of the stack on lsst-dev, so marking this as won't fix for now... even though a good bit of work went into making it possible through the above-referenced PR to fastparquet.

            Show
            tmorton Tim Morton [X] (Inactive) added a comment - This issue has gone away since the recent update of the stack on lsst-dev, so marking this as won't fix for now... even though a good bit of work went into making it possible through the above-referenced PR to fastparquet.
            Hide
            tmorton Tim Morton [X] (Inactive) added a comment -

            Changing status to "done" to reflect the work that went into this and the related necessary development contributions to fastparquet.

            Show
            tmorton Tim Morton [X] (Inactive) added a comment - Changing status to "done" to reflect the work that went into this and the related necessary development contributions to fastparquet.

              People

              Assignee:
              tmorton Tim Morton [X] (Inactive)
              Reporter:
              tmorton Tim Morton [X] (Inactive)
              Watchers:
              John Swinbank, Tim Morton [X] (Inactive), Yusra AlSayyad
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.