Parquet files are in use internally within DM for a variety of purposes, and have already been included in the latest sizing model (dmtn-135)
The adoption of a columnar data format, such as Parquet, as a user facing data format, particularly in support of next-to-data analysis and to provide a MapReduce-style access to catalog data has been discussed at several DMLT and SST meetings, and also at several user-facing meetings. e.g LSP FDR. The idea has received broad acceptance within DM, however is not stated in any DM policy/design/architecture document. The community are now asking us what our plans are.
This RFC is to accept to support a columnar format for catalog data (Parquet at this time), in addition to ADQL (Qserv), and to update DM documents to reflect this. Documents that would need update include:
- DPDD - should contain a short section explaining to the user community that catalog data will be available in both ADQL(Qserv) and columnar format (Parquet)
- LDM-148 - include in the DM architecture
- ... others?