Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-28389

Create a gen3 task to collate tract-level parquet tables for QA analyses

    XMLWordPrintable

    Details

    • Story Points:
      6
    • Epic Link:
    • Sprint:
      DRP S21a (Dec Jan)
    • Team:
      Data Release Production
    • Urgent?:
      No

      Description

      A new, gen3-only, repository for QA analysis is under construction and will eventually replace the pipe_analysis gen2 version that lives in lsst-dev.  Much of the functionality used in the latter needs to be adapted and included in the former.  A first crucial step is to create a task to aggregate the tract level catalogs that include (and only include) the columns required for the QA plotting and analysis routines.  A PipelineTask will be written to perform this aggregation in the same manner as is currently done in the pipe_analysis scripts so that all subsequent pipelines in the new analysis_drp can use this as their input catalog dataset.

        Attachments

          Issue Links

            Activity

            Hide
            lauren Lauren MacArthur added a comment - - edited

            Ok, I believe I've got things working as desired.  The commit messages and docs hopefully convey the function and purpose of these new datasets.  I have persisted tables for all 6 bands in RC2 tract 9813 (COSMOS) and have checked that they all have the same number and ordering of rows as the associated objectTable_tract dataset (NaNs have been inserted for missing patches for a given band, as is done for the objectTable_tract tables). They can be accessed via, e.g.:

            In [1]: from lsst.daf.butler import Butler
            In [2]: rootDir = "/project/hsc/gen3repo/rc2w02_ssw03"
               ...: butler = Butler(rootDir, collections="u/lauren/DM-28389")
            In [3]: gQaTable_forced = butler.get("qaTractTable_forced", tract=9813, band="g")
            In [4]: objectTable_tract = butler.get("objectTable_tract", tract=9813)
            In [5]: gQaTable_forced.index.equals(objectTable_tract.index)
            Out[5]: True
            

            (the last line is the check on the matched number of rows and sorting of the catalogs, which is required to allow for ease of joint use, e.g. being able to use a single boolean expression to filter on both).

            The pipeline command line to create these tables looks like:

            pipetask run -b /project/hsc/gen3repo/rc2w02_ssw03 -i HSC/runs/RC2/w_2021_02 -o u/lauren/DM-28389 -p $OBS_SUBARU_DIR/pipelines/makeQaTractTables.yaml -d "instrument='HSC' AND tract=9813 AND band='r' AND skymap='hsc_rings_v1'"
            

            (but leaving out the -i HSC/runs/RC2/w_2021_02 on subsequent runs.)

            There is a generic pipeline in analysis_drp. The override in obs_subaru is to add a filterMap between the generic "band" names and the physical HSC filters (this is required as the _obj tables are indexed on the latter).

            Show
            lauren Lauren MacArthur added a comment - - edited Ok, I believe I've got things working as desired.  The commit messages and docs hopefully convey the function and purpose of these new datasets.  I have persisted tables for all 6 bands in RC2 tract 9813 (COSMOS) and have checked that they all have the same number and ordering of rows as the associated objectTable_tract dataset (NaNs have been inserted for missing patches for a given band, as is done for the objectTable_tract tables). They can be accessed via, e.g.: In [ 1 ]: from lsst.daf.butler import Butler In [ 2 ]: rootDir = "/project/hsc/gen3repo/rc2w02_ssw03" ...: butler = Butler(rootDir, collections = "u/lauren/DM-28389" ) In [ 3 ]: gQaTable_forced = butler.get( "qaTractTable_forced" , tract = 9813 , band = "g" ) In [ 4 ]: objectTable_tract = butler.get( "objectTable_tract" , tract = 9813 ) In [ 5 ]: gQaTable_forced.index.equals(objectTable_tract.index) Out[ 5 ]: True (the last line is the check on the matched number of rows and sorting of the catalogs, which is required to allow for ease of joint use, e.g. being able to use a single boolean expression to filter on both). The pipeline command line to create these tables looks like: pipetask run - b / project / hsc / gen3repo / rc2w02_ssw03 - i HSC / runs / RC2 / w_2021_02 - o u / lauren / DM - 28389 - p $OBS_SUBARU_DIR / pipelines / makeQaTractTables.yaml - d "instrument='HSC' AND tract=9813 AND band='r' AND skymap='hsc_rings_v1'" (but leaving out the -i HSC/runs/RC2/w_2021_02 on subsequent runs.) There is a generic pipeline in  analysis_drp . The override in obs_subaru is to add a filterMap between the generic "band" names and the physical HSC filters (this is required as the _obj tables are indexed on the latter).
            Hide
            lauren Lauren MacArthur added a comment -

            I'm putting you both as reviewers as I think Sophie Reed should comment on whether the tables provide what is needed for the the scripts she is working on for this repository, but I think maybe Jim Bosch (or someone else he can appoint that is suitable) should at least weigh in on the Gen3 aspects (it's my first Gen3 task, so I very likely have some less-than-ideal approaches in there!)

            Show
            lauren Lauren MacArthur added a comment - I'm putting you both as reviewers as I think Sophie Reed  should comment on whether the tables provide what is needed for the the scripts she is working on for this repository, but I think maybe Jim Bosch  (or someone else he can appoint that is suitable) should at least weigh in on the Gen3 aspects (it's my first Gen3 task, so I very likely have some less-than-ideal approaches in there!)
            Hide
            lauren Lauren MacArthur added a comment -

            Already got some tips for improvement, so putting back to In Progress to clean thing up before sending it back for review (but the data products themselves won’t be changed, so feel free to play with them to see if they are of use).

            Show
            lauren Lauren MacArthur added a comment - Already got some tips for improvement, so putting back to In Progress to clean thing up before sending it back for review (but the data products themselves won’t be changed, so feel free to play with them to see if they are of use).
            Hide
            lauren Lauren MacArthur added a comment - - edited

            Ok, back in for review.  I've added TODO s to remove the filterMap config song & dance once DM-28479 lands, but let me know if it's better to hold off merging this until that gets resolved.

            I'm currently creating the tables for all three RC2 tracts (only 9813 is done so far, well, almost...still working on y, but they should all be done by tonight).  They can be accessed as indicated in the first comment on this ticket. 

            Show
            lauren Lauren MacArthur added a comment - - edited Ok, back in for review.  I've added  TODO s to remove the  filterMap  config song & dance once DM-28479 lands, but let me know if it's better to hold off merging this until that gets resolved. I'm currently creating the tables for all three RC2 tracts (only 9813 is done so far, well, almost...still working on y, but they should all be done by tonight).  They can be accessed as indicated in the first comment on this ticket. 
            Hide
            jbosch Jim Bosch added a comment -

            I think I'm done reviewing (one final comment on the PR, in response to a question there). Thanks for handling all of my restructuring requests good-naturedly. I'll let Sophie Reed be the one to hit Reviewed in case she wants one more look.

            Show
            jbosch Jim Bosch added a comment - I think I'm done reviewing (one final comment on the PR, in response to a question there). Thanks for handling all of my restructuring requests good-naturedly. I'll let Sophie Reed be the one to hit Reviewed in case she wants one more look.
            Hide
            lauren Lauren MacArthur added a comment -

            Thanks, Jim!  I addressed your last comment and will now wait for a final review from Sophie Reed (which will also need approval of the obs_subaru PR if we still think it's necessary at that time).

            Show
            lauren Lauren MacArthur added a comment - Thanks, Jim!  I addressed your last comment and will now wait for a final review from Sophie Reed  (which will also need approval of the obs_subaru PR if we still think it's necessary at that time).
            Hide
            sophiereed Sophie Reed added a comment -

            Looks good to me.

            Show
            sophiereed Sophie Reed added a comment - Looks good to me.
            Hide
            lauren Lauren MacArthur added a comment -

            Thanks to you both!  Merged and done.

            Show
            lauren Lauren MacArthur added a comment - Thanks to you both!  Merged and done.

              People

              Assignee:
              lauren Lauren MacArthur
              Reporter:
              lauren Lauren MacArthur
              Reviewers:
              Jim Bosch, Sophie Reed
              Watchers:
              Jim Bosch, Lauren MacArthur, Lee Kelvin, Sophie Reed, Yusra AlSayyad
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.