Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-28389

Create a gen3 task to collate tract-level parquet tables for QA analyses

    XMLWordPrintable

Details

    • 6
    • DRP S21a (Dec Jan)
    • Data Release Production
    • No

    Description

      A new, gen3-only, repository for QA analysis is under construction and will eventually replace the pipe_analysis gen2 version that lives in lsst-dev.  Much of the functionality used in the latter needs to be adapted and included in the former.  A first crucial step is to create a task to aggregate the tract level catalogs that include (and only include) the columns required for the QA plotting and analysis routines.  A PipelineTask will be written to perform this aggregation in the same manner as is currently done in the pipe_analysis scripts so that all subsequent pipelines in the new analysis_drp can use this as their input catalog dataset.

      Attachments

        Issue Links

          Activity

            lauren Lauren MacArthur added a comment - - edited

            Ok, I believe I've got things working as desired.  The commit messages and docs hopefully convey the function and purpose of these new datasets.  I have persisted tables for all 6 bands in RC2 tract 9813 (COSMOS) and have checked that they all have the same number and ordering of rows as the associated objectTable_tract dataset (NaNs have been inserted for missing patches for a given band, as is done for the objectTable_tract tables). They can be accessed via, e.g.:

            In [1]: from lsst.daf.butler import Butler
            In [2]: rootDir = "/project/hsc/gen3repo/rc2w02_ssw03"
               ...: butler = Butler(rootDir, collections="u/lauren/DM-28389")
            In [3]: gQaTable_forced = butler.get("qaTractTable_forced", tract=9813, band="g")
            In [4]: objectTable_tract = butler.get("objectTable_tract", tract=9813)
            In [5]: gQaTable_forced.index.equals(objectTable_tract.index)
            Out[5]: True
            

            (the last line is the check on the matched number of rows and sorting of the catalogs, which is required to allow for ease of joint use, e.g. being able to use a single boolean expression to filter on both).

            The pipeline command line to create these tables looks like:

            pipetask run -b /project/hsc/gen3repo/rc2w02_ssw03 -i HSC/runs/RC2/w_2021_02 -o u/lauren/DM-28389 -p $OBS_SUBARU_DIR/pipelines/makeQaTractTables.yaml -d "instrument='HSC' AND tract=9813 AND band='r' AND skymap='hsc_rings_v1'"
            

            (but leaving out the -i HSC/runs/RC2/w_2021_02 on subsequent runs.)

            There is a generic pipeline in analysis_drp. The override in obs_subaru is to add a filterMap between the generic "band" names and the physical HSC filters (this is required as the _obj tables are indexed on the latter).

            lauren Lauren MacArthur added a comment - - edited Ok, I believe I've got things working as desired.  The commit messages and docs hopefully convey the function and purpose of these new datasets.  I have persisted tables for all 6 bands in RC2 tract 9813 (COSMOS) and have checked that they all have the same number and ordering of rows as the associated objectTable_tract dataset (NaNs have been inserted for missing patches for a given band, as is done for the objectTable_tract tables). They can be accessed via, e.g.: In [ 1 ]: from lsst.daf.butler import Butler In [ 2 ]: rootDir = "/project/hsc/gen3repo/rc2w02_ssw03" ...: butler = Butler(rootDir, collections = "u/lauren/DM-28389" ) In [ 3 ]: gQaTable_forced = butler.get( "qaTractTable_forced" , tract = 9813 , band = "g" ) In [ 4 ]: objectTable_tract = butler.get( "objectTable_tract" , tract = 9813 ) In [ 5 ]: gQaTable_forced.index.equals(objectTable_tract.index) Out[ 5 ]: True (the last line is the check on the matched number of rows and sorting of the catalogs, which is required to allow for ease of joint use, e.g. being able to use a single boolean expression to filter on both). The pipeline command line to create these tables looks like: pipetask run - b / project / hsc / gen3repo / rc2w02_ssw03 - i HSC / runs / RC2 / w_2021_02 - o u / lauren / DM - 28389 - p $OBS_SUBARU_DIR / pipelines / makeQaTractTables.yaml - d "instrument='HSC' AND tract=9813 AND band='r' AND skymap='hsc_rings_v1'" (but leaving out the -i HSC/runs/RC2/w_2021_02 on subsequent runs.) There is a generic pipeline in  analysis_drp . The override in obs_subaru is to add a filterMap between the generic "band" names and the physical HSC filters (this is required as the _obj tables are indexed on the latter).

            I'm putting you both as reviewers as I think sophiereed should comment on whether the tables provide what is needed for the the scripts she is working on for this repository, but I think maybe jbosch (or someone else he can appoint that is suitable) should at least weigh in on the Gen3 aspects (it's my first Gen3 task, so I very likely have some less-than-ideal approaches in there!)

            lauren Lauren MacArthur added a comment - I'm putting you both as reviewers as I think sophiereed  should comment on whether the tables provide what is needed for the the scripts she is working on for this repository, but I think maybe jbosch  (or someone else he can appoint that is suitable) should at least weigh in on the Gen3 aspects (it's my first Gen3 task, so I very likely have some less-than-ideal approaches in there!)

            Already got some tips for improvement, so putting back to In Progress to clean thing up before sending it back for review (but the data products themselves won’t be changed, so feel free to play with them to see if they are of use).

            lauren Lauren MacArthur added a comment - Already got some tips for improvement, so putting back to In Progress to clean thing up before sending it back for review (but the data products themselves won’t be changed, so feel free to play with them to see if they are of use).
            lauren Lauren MacArthur added a comment - - edited

            Ok, back in for review.  I've added TODO s to remove the filterMap config song & dance once DM-28479 lands, but let me know if it's better to hold off merging this until that gets resolved.

            I'm currently creating the tables for all three RC2 tracts (only 9813 is done so far, well, almost...still working on y, but they should all be done by tonight).  They can be accessed as indicated in the first comment on this ticket. 

            lauren Lauren MacArthur added a comment - - edited Ok, back in for review.  I've added  TODO s to remove the  filterMap  config song & dance once DM-28479 lands, but let me know if it's better to hold off merging this until that gets resolved. I'm currently creating the tables for all three RC2 tracts (only 9813 is done so far, well, almost...still working on y, but they should all be done by tonight).  They can be accessed as indicated in the first comment on this ticket. 
            jbosch Jim Bosch added a comment -

            I think I'm done reviewing (one final comment on the PR, in response to a question there). Thanks for handling all of my restructuring requests good-naturedly. I'll let sophiereed be the one to hit Reviewed in case she wants one more look.

            jbosch Jim Bosch added a comment - I think I'm done reviewing (one final comment on the PR, in response to a question there). Thanks for handling all of my restructuring requests good-naturedly. I'll let sophiereed be the one to hit Reviewed in case she wants one more look.

            Thanks, Jim!  I addressed your last comment and will now wait for a final review from sophiereed (which will also need approval of the obs_subaru PR if we still think it's necessary at that time).

            lauren Lauren MacArthur added a comment - Thanks, Jim!  I addressed your last comment and will now wait for a final review from sophiereed  (which will also need approval of the obs_subaru PR if we still think it's necessary at that time).
            sophiereed Sophie Reed added a comment -

            Looks good to me.

            sophiereed Sophie Reed added a comment - Looks good to me.

            Thanks to you both!  Merged and done.

            lauren Lauren MacArthur added a comment - Thanks to you both!  Merged and done.

            People

              lauren Lauren MacArthur
              lauren Lauren MacArthur
              Jim Bosch, Sophie Reed
              Jim Bosch, Lauren MacArthur, Lee Kelvin, Sophie Reed, Yusra AlSayyad
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Jenkins

                  No builds found.