Fix Version/s: None
Follow what Science Pipelines do on pandas.DataFrame operations and add index to column list
- is triggered by
DM-31825 Parquet Table clean up before DP0.2 Preliminary run.
Fritz Mueller may you please review this https://github.com/lsst-dm/parquet_tools/pull/4
The Science Pipelines has objectId/sourceId as the pandas DataFrame's index and hence decided to drop them from the regular columns. As we want those, we need to add them back explicitly.
Fritz Mueller I modified and made this be controlled by an argument rather than tying to the assumption in Science Pipelines. What do you think?
After much discussion it was decided that nothing needs to be changed because adding an index column in pq2csv does not change what is actually written. In fact, the input parquet file should have had this done prior to being used as input to pq2csv.
In the case of the object table parquet files from Science Pipelines output, objectId is the DataFrame index. Need to df.reset_index to have it in the columns like this https://github.com/lsst/ci_imsim/commit/46efcb2910c5fee7b4ade06664a4224d57e5d7c7