Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-18353

Add pre-computed values to parquet tables output by pipe_analysis scripts

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: pipe_analysis
    • Labels:
      None
    • Story Points:
      6
    • Epic Link:
    • Sprint:
      DRP S19-4, DRP S19-5
    • Team:
      Data Release Production

      Description

      In order to facilitate the interactive/dashboard QA work of Tim Morton [X], currently, the pipe_analysis scripts have the option to write out parquet tables for the various datasets at the visit and tract level.  Currently, they only contain the information in the source catalogs plus an extra column to indicate the deemed suitability of a given source for QA analyses, and columns to indicate the ccd & visit or patch & tract per object.  It would also be of great use for the interactive plotting to have a set of columns with pre-computed values of interest for plotting (e.g. mag differences, reference catalog comparisons, etc.) These are also the per-object values from which various (aggregated) metrics will be computed.  The strategy for now will be to add a column to the parquet tables for any value that is computed and plotted in the pipe_analysis scripts.

        Attachments

          Issue Links

            Activity

            Hide
            lauren Lauren MacArthur added a comment -

            Ok, give that a try.

            Show
            lauren Lauren MacArthur added a comment - Ok, give that a try.
            Hide
            tmorton Tim Morton [X] (Inactive) added a comment -

            OK, that ran, but now it seems like there aren't any outputs created?

            I ran

            visitAnalysis.py /datasets/hsc/repo/rerun/RC/w_2019_10/DM-17940/ --output /project/tmorton/tickets/DM-18353 --tract 9813 --id filter=HSC-R visit=1202 --no-versions --config writeParquetOnly=True

            and it did things and looked like it finished with no errors, but there is no output:

            (lsst-scipipe) [tmorton@lsst-dev01 pipe_analysis]$ ls /project/tmorton/tickets/DM-18353
            repositoryCfg.yaml
            

            Show
            tmorton Tim Morton [X] (Inactive) added a comment - OK, that ran, but now it seems like there aren't any outputs created? I ran visitAnalysis.py /datasets/hsc/repo/rerun/RC/w_2019_10/DM-17940/ --output /project/tmorton/tickets/DM-18353 --tract 9813 --id filter=HSC-R visit=1202 --no-versions --config writeParquetOnly=True and it did things and looked like it finished with no errors, but there is no output: (lsst-scipipe) [tmorton@lsst-dev01 pipe_analysis]$ ls /project/tmorton/tickets/DM-18353 repositoryCfg.yaml
            Hide
            tmorton Tim Morton [X] (Inactive) added a comment -

            Ah, my bad-- I hadn't set up qa_explorer so tables weren't written.  Trying again.

            Show
            tmorton Tim Morton [X] (Inactive) added a comment - Ah, my bad-- I hadn't set up qa_explorer so tables weren't written.  Trying again.
            Hide
            tmorton Tim Morton [X] (Inactive) added a comment -

            OK, looks good-- My only minor concern with this implementation is that it computes the same quantities twice (once for the tables, and once for the plots), such that if anything changes, we'd have to change it in two places.  But I'm OK with this for now given that I've gotta get moving on preparing this sample repo for the contractors.

            Show
            tmorton Tim Morton [X] (Inactive) added a comment - OK, looks good-- My only minor concern with this implementation is that it computes the same quantities twice (once for the tables, and once for the plots), such that if anything changes, we'd have to change it in two places.  But I'm OK with this for now given that I've gotta get moving on preparing this sample repo for the contractors.
            Hide
            lauren Lauren MacArthur added a comment - - edited

            I totally agree and this is front and center for the pipe_analysis overhaul (likely to be led by Sophie Reed), so it's not worth refactoring this now. Ideally, we will write out the parquet tables with all the relevant info and then just read in those for the plotting scripts.

            I created a PR for your approval.

            Show
            lauren Lauren MacArthur added a comment - - edited I totally agree and this is front and center for the pipe_analysis overhaul (likely to be led by Sophie Reed ), so it's not worth refactoring this now. Ideally, we will write out the parquet tables with all the relevant info and then just read in those for the plotting scripts. I created a PR for your approval.

              People

              Assignee:
              lauren Lauren MacArthur
              Reporter:
              lauren Lauren MacArthur
              Reviewers:
              Tim Morton [X] (Inactive)
              Watchers:
              Lauren MacArthur, Tim Morton [X] (Inactive), Yusra AlSayyad
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.