Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-32022

add index to column list

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Follow what Science Pipelines do on pandas.DataFrame operations and add index to column list

        Attachments

          Issue Links

            Activity

            Hide
            hchiang2 Hsin-Fang Chiang added a comment - - edited

            In the case of the object table parquet files from Science Pipelines output, objectId is the DataFrame index. Need to df.reset_index to have it in the columns like this https://github.com/lsst/ci_imsim/commit/46efcb2910c5fee7b4ade06664a4224d57e5d7c7

            Show
            hchiang2 Hsin-Fang Chiang added a comment - - edited In the case of the object table parquet files from Science Pipelines output, objectId is the DataFrame index. Need to df.reset_index to have it in the columns like this https://github.com/lsst/ci_imsim/commit/46efcb2910c5fee7b4ade06664a4224d57e5d7c7
            Hide
            hchiang2 Hsin-Fang Chiang added a comment -

            Fritz Mueller may you please review this https://github.com/lsst-dm/parquet_tools/pull/4

            The Science Pipelines has objectId/sourceId as the pandas DataFrame's index and hence decided to drop them from the regular columns. As we want those, we need to add them back explicitly.

            Show
            hchiang2 Hsin-Fang Chiang added a comment - Fritz Mueller may you please review this https://github.com/lsst-dm/parquet_tools/pull/4 The Science Pipelines has objectId / sourceId as the pandas DataFrame's index and hence decided to drop them from the regular columns. As we want those, we need to add them back explicitly.
            Hide
            hchiang2 Hsin-Fang Chiang added a comment -

            Fritz Mueller I modified and made this be controlled by an argument rather than tying to the assumption in Science Pipelines. What do you think?

            Show
            hchiang2 Hsin-Fang Chiang added a comment - Fritz Mueller I modified and made this be controlled by an argument rather than tying to the assumption in Science Pipelines. What do you think?
            Hide
            abh Andy Hanushevsky added a comment -

            After much discussion it was decided that nothing needs to be changed because adding an index column in pq2csv does not change what is actually written. In fact, the input parquet file should have had this done prior to being used as input to pq2csv.

            Show
            abh Andy Hanushevsky added a comment - After much discussion it was decided that nothing needs to be changed because adding an index column in pq2csv does not change what is actually written. In fact, the input parquet file should have had this done prior to being used as input to pq2csv.

              People

              Assignee:
              hchiang2 Hsin-Fang Chiang
              Reporter:
              hchiang2 Hsin-Fang Chiang
              Reviewers:
              Fritz Mueller
              Watchers:
              Andy Hanushevsky, Fritz Mueller, Hsin-Fang Chiang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.