Description
Parquet files are in use internally within DM for a variety of purposes, and have already been included in the latest sizing model (dmtn-135)
The adoption of a columnar data format, such as Parquet, as a user facing data format, particularly in support of next-to-data analysis and to provide a MapReduce-style access to catalog data has been discussed at several DMLT and SST meetings, and also at several user-facing meetings. e.g LSP FDR. The idea has received broad acceptance within DM, however is not stated in any DM policy/design/architecture document. The community are now asking us what our plans are.
This RFC is to accept to support a columnar format for catalog data (Parquet at this time), in addition to ADQL (Qserv), and to update DM documents to reflect this. Documents that would need update include:
- DPDD - should contain a short section explaining to the user community that catalog data will be available in both ADQL(Qserv) and columnar format (Parquet)
- LDM-148 - include in the DM architecture
- ... others?
Attachments
Issue Links
- is triggering
-
DM-24548 Update DPDD and DMSR to state that columnar files will be made available
- To Do
-
DM-24549 Create LDM to describe user-facing parquet data products
- To Do
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
Preserving some discussion about "next actions" for future reference: https://lsstc.slack.com/archives/C2K6YMTK2/p1583539429112300