Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-27013

Change CalibrateTask refcat defaults to Gaia DR2 for astrometry and PS1 for photometry

    XMLWordPrintable

    Details

    • Story Points:
      4
    • Sprint:
      AP S22-3 (February), AP S22-4 (March), AP S22-5 (April)
    • Team:
      Alert Production
    • Urgent?:
      No

      Description

      This is the implementation ticket for RFC-697, to update the CalibrateTask reference catalog defaults to use Gaia DR2 for astrometry and PS1 for photometry and remove relevant overrides from obs packages. These are the defaults that we want, although exactly how to implement them (probably inside CalibrateConfig.setDefaults()?):

      astromRefObjLoader.ref_dataset_name = "gaia_dr2_20200414"
      astromRefObjLoader.anyFilterMapsToThis = "phot_g_mean"
      photoRefObjLoader.ref_dataset_name = "ps1_pv3_3pi_20170110"
      

        Attachments

          Issue Links

            Activity

            Show
            Parejkoj John Parejko added a comment - - edited Jenkins with ci_hsc ci_imsim: https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/35656/pipeline
            Hide
            Parejkoj John Parejko added a comment -

            lsst_ci is failing with these changes on the decam and cfht quick tests. DM-33058 removes those tests (since they're gen2 only, the DECam data is deprecated, and the CFHT data has undergone significant changes), so I'm marking this as blocked on that ticket. I think that's the only remaining problem: I fixed other errors in the bigger CI packages.

            Show
            Parejkoj John Parejko added a comment - lsst_ci is failing with these changes on the decam and cfht quick tests. DM-33058 removes those tests (since they're gen2 only, the DECam data is deprecated, and the CFHT data has undergone significant changes), so I'm marking this as blocked on that ticket. I think that's the only remaining problem: I fixed other errors in the bigger CI packages.
            Show
            Parejkoj John Parejko added a comment - - edited New Jenkins after DM-33058 : https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/35720/pipeline
            Hide
            Parejkoj John Parejko added a comment -

            Lauren MacArthur: Giving this to you to review, knowing that you've got a lot of gen2/gen3 comparison going on right now: this one isn't a rush. The review itself is medium-to-small, but the bigger question is how switching HSC to Gaia DR2 affects HSC single frame processing.

            I previously filed DM-27858 as a ticket to track any tests that go with this ticket, so we can use that to coordinate large scale processing tests.

            Show
            Parejkoj John Parejko added a comment - Lauren MacArthur : Giving this to you to review, knowing that you've got a lot of gen2/gen3 comparison going on right now: this one isn't a rush. The review itself is medium-to-small, but the bigger question is how switching HSC to Gaia DR2 affects HSC single frame processing. I previously filed DM-27858 as a ticket to track any tests that go with this ticket, so we can use that to coordinate large scale processing tests.
            Hide
            mrawls Meredith Rawls added a comment -

            DECam comment: I thought gaia was the DECam astrometry default across the board already! Turns out that is only true for ap_verify, not for ap_pipe, cute. Nonetheless, zero objection from the DECam department, or at least from me - please make it so.

            Show
            mrawls Meredith Rawls added a comment - DECam comment: I thought gaia was the DECam astrometry default across the board already! Turns out that is only true for ap_verify, not for ap_pipe, cute. Nonetheless, zero objection from the DECam department, or at least from me - please make it so.
            Show
            Parejkoj John Parejko added a comment - New Jenkins for new review: https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/35931/pipeline
            Hide
            Parejkoj John Parejko added a comment - - edited

            Lee Kelvin: do you mind doing this review? Lauren has said she won't have time for a while, and I'd like to not have it linger too long. I closed the HSC branch and deleted the ticket so that it doesn't mess up Jenkins: we'll hold off on changing the HSC defaults until after someone has time to run a bunch of validation tests on it. Otherwise, I think everything else here is still good.

            Show
            Parejkoj John Parejko added a comment - - edited Lee Kelvin : do you mind doing this review? Lauren has said she won't have time for a while, and I'd like to not have it linger too long. I closed the HSC branch and deleted the ticket so that it doesn't mess up Jenkins: we'll hold off on changing the HSC defaults until after someone has time to run a bunch of validation tests on it. Otherwise, I think everything else here is still good.
            Hide
            lskelvin Lee Kelvin added a comment - - edited

            Thank you for giving me the time to take a closer look at these changes John. I've set up my own repo on lsst-devl and imported some test DECam data from the Merian survey which we've been processing on the tiger2-sumire machine here at Princeton. In case you'd like to take a look at my data reductions yourself, here are the details:

            REPO: /project/lskelvin/repo
            vanilla run: u/lskelvin/scratch/DM-27013-vanilla
            ticket run: u/lskelvin/scratch/DM-27013-ticket
            

            I've pushed DECam visit 1056950 through step 1 and step 2 data reductions on both the vanilla stack (w_2022_15) and the stack with this ticket branch loaded for pipe_tasks and obs_decam. The data reductions were made using pipetask commands similar to this:

            Step 1:

            pipetask --long-log run --register-dataset-types -j 12 \
            -b /project/lskelvin/repo \
            -i DECam/runs/merian/w_2022_13 \
            --output-run u/lskelvin/scratch/DM-27013-vanilla \
            -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step1 \
            -d "instrument='DECam' AND skymap='hsc_rings_v1' AND visit=1056950"
            

            Step 2:

            pipetask --long-log run --register-dataset-types -j 12 \
            --extend-run \
            -b /project/lskelvin/repo \
            -i u/lskelvin/scratch/DM-27013-vanilla \
            --output-run u/lskelvin/scratch/DM-27013-vanilla \
            -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2 \
            -d "instrument='DECam' AND skymap='hsc_rings_v1' AND visit=1056950"
            

            So far, I've begun looking at the sourceTable_visit dataset type to ascertain how much has changed due to the impact of this ticket. Here are the numbers of 'good' sources (as given by detect_isPrimary) over the total number of sources in each table:

            vanilla: 60313 / 71202
            ticket: 60412 / 71319
            

            My first plot is a comparison of the psfFlux_apCorr data, attached below.

            As you can see, the overall catalogue numbers are close but not identical, and the aperture corrected PSF fluxes appear to change by a significant amount. I'm continuing to make comparisons, but wanted to put these first quick results up here now to let you know my initial thoughts.

            Show
            lskelvin Lee Kelvin added a comment - - edited Thank you for giving me the time to take a closer look at these changes John. I've set up my own repo on lsst-devl and imported some test DECam data from the Merian survey which we've been processing on the tiger2-sumire machine here at Princeton. In case you'd like to take a look at my data reductions yourself, here are the details: REPO: /project/lskelvin/repo vanilla run: u/lskelvin/scratch/DM-27013-vanilla ticket run: u/lskelvin/scratch/DM-27013-ticket I've pushed DECam visit 1056950 through step 1 and step 2 data reductions on both the vanilla stack ( w_2022_15 ) and the stack with this ticket branch loaded for pipe_tasks and obs_decam . The data reductions were made using pipetask commands similar to this: Step 1: pipetask --long-log run --register-dataset-types -j 12 \ -b /project/lskelvin/repo \ -i DECam/runs/merian/w_2022_13 \ --output-run u/lskelvin/scratch/DM-27013-vanilla \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step1 \ -d "instrument='DECam' AND skymap='hsc_rings_v1' AND visit=1056950" Step 2: pipetask --long-log run --register-dataset-types -j 12 \ --extend-run \ -b /project/lskelvin/repo \ -i u/lskelvin/scratch/DM-27013-vanilla \ --output-run u/lskelvin/scratch/DM-27013-vanilla \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2 \ -d "instrument='DECam' AND skymap='hsc_rings_v1' AND visit=1056950" So far, I've begun looking at the sourceTable_visit dataset type to ascertain how much has changed due to the impact of this ticket. Here are the numbers of 'good' sources (as given by detect_isPrimary ) over the total number of sources in each table: vanilla: 60313 / 71202 ticket: 60412 / 71319 My first plot is a comparison of the psfFlux_apCorr data, attached below. As you can see, the overall catalogue numbers are close but not identical, and the aperture corrected PSF fluxes appear to change by a significant amount. I'm continuing to make comparisons, but wanted to put these first quick results up here now to let you know my initial thoughts.
            Hide
            lskelvin Lee Kelvin added a comment -

            I was confused as to what the 'correct' answer should be in the results above. I've reduced DECam visit 1056950 using weeklies 12, 13, 14, 15 and with this ticket. Further analysis below.

            Here are the number of detect_isPrimary ('good') sources over the total number of sources in the sourceTable_visit:

            w12 good/total = 60412 / 71319
            w13 good/total = 60412 / 71319
            w14 good/total = 60313 / 71202
            w15 good/total = 60313 / 71202
            tkt good/total = 60412 / 71319
            

            As shown, it looks like a change was introduced in w14 that modified these numbers, before coming back into line on this ticket branch. Looking at the w14 changelog, I'm not sure exactly what the culprit may be, but perhaps DM-33857 is the chief suspect?

            As a further check, I took a look at the median sky source flux (selecting sky sources using the sky_source boolean flag, and picking out the 9-pixel circular aperture flux using ap09Flux):

            w12 median sky flux = 0.04073
            w13 median sky flux = 3.12717
            w14 median sky flux = 0.14926
            w15 median sky flux = 0.14926
            tkt median sky flux = 0.04073
            

            Things here changed in w13, and again in w14, before again coming back to the w12 value on this ticket branch. The above DM-33857 may have had an impact here in w14. From the w13 changelog, DM-34019 touches pipe_tasks, but I'm not sure that's the issue here?

            Before I run any further checks, would it be okay for us to rebase this ticket branch to the latest main branch? At which point, I'll regenerate the tests above to see if anything has changed.

            Show
            lskelvin Lee Kelvin added a comment - I was confused as to what the 'correct' answer should be in the results above. I've reduced DECam visit 1056950 using weeklies 12, 13, 14, 15 and with this ticket. Further analysis below. Here are the number of detect_isPrimary ('good') sources over the total number of sources in the sourceTable_visit : w12 good/total = 60412 / 71319 w13 good/total = 60412 / 71319 w14 good/total = 60313 / 71202 w15 good/total = 60313 / 71202 tkt good/total = 60412 / 71319 As shown, it looks like a change was introduced in w14 that modified these numbers, before coming back into line on this ticket branch. Looking at the w14 changelog , I'm not sure exactly what the culprit may be, but perhaps DM-33857 is the chief suspect? As a further check, I took a look at the median sky source flux (selecting sky sources using the sky_source boolean flag, and picking out the 9-pixel circular aperture flux using ap09Flux ): w12 median sky flux = 0.04073 w13 median sky flux = 3.12717 w14 median sky flux = 0.14926 w15 median sky flux = 0.14926 tkt median sky flux = 0.04073 Things here changed in w13, and again in w14, before again coming back to the w12 value on this ticket branch. The above DM-33857 may have had an impact here in w14. From the w13 changelog , DM-34019 touches pipe_tasks, but I'm not sure that's the issue here? Before I run any further checks, would it be okay for us to rebase this ticket branch to the latest main branch? At which point, I'll regenerate the tests above to see if anything has changed.
            Hide
            Parejkoj John Parejko added a comment - - edited

            I have rebased all the branches onto main.

            Show
            Parejkoj John Parejko added a comment - - edited I have rebased all the branches onto main.
            Hide
            lauren Lauren MacArthur added a comment -

            Seems the rebased run is indeed required for isolating & assessing the changes due to this ticket (and just what is going on with the sky sources may be worth some investigation!) When you get the new rebased run results, I’d also be curious to see some of the logs from the astrometry task, in particular those looking like:

             "Matched and fit WCS in %d iterations; "
             "found %d matches with on-sky distance mean and scatter = %0.3f +- %0.3f arcsec"
            

            Show
            lauren Lauren MacArthur added a comment - Seems the rebased run is indeed required for isolating & assessing the changes due to this ticket (and just what is going on with the sky sources may be worth some investigation!) When you get the new rebased run results, I’d also be curious to see some of the logs from the astrometry task, in particular those looking like: "Matched and fit WCS in %d iterations; " "found %d matches with on-sky distance mean and scatter = %0.3f +- %0.3f arcsec"
            Hide
            lskelvin Lee Kelvin added a comment -

            Thanks for rebasing John. I've updated my test script above, reran everything on w16 and w16+DM-27013, and I update my sky source results below:

            The number of detect_isPrimary ('good') sources over the total number of sources in the sourceTable_visit:

            w12 good/total = 60412 / 71319
            w13 good/total = 60412 / 71319
            w14 good/total = 60313 / 71202
            w15 good/total = 60313 / 71202
            w16 good/total = 60313 / 71202
            tkt good/total = 60412 / 71319
            

            and the median sky source flux:

            w12 median sky flux = 0.04073
            w13 median sky flux = 3.12717
            w14 median sky flux = 0.14926
            w15 median sky flux = 0.14926
            w16 median sky flux = 0.14926
            tkt median sky flux = 0.04073
            

            With these results in mind, it seems as if the metrics above are back in sync with their prior w12 values. I attach my step 1 data processing logs for w12, w16 and the ticket branch reduction to this ticket, for reference.

            In response to your astrometry question Lauren MacArthur, mean/scatter values are significantly lower on this ticket branch, e.g., for detector 25 in my example visit:

            w12: found 89 matches with on-sky distance mean and scatter = 0.051 +- 0.025 arcsec
            w16: found 89 matches with on-sky distance mean and scatter = 0.051 +- 0.025 arcsec
            tkt: found 85 matches with on-sky distance mean and scatter = 0.010 +- 0.006 arcsec
            

            Detector 40:

            w12: found 69 matches with on-sky distance mean and scatter = 0.060 +- 0.037 arcsec
            w16: found 70 matches with on-sky distance mean and scatter = 0.059 +- 0.038 arcsec
            tkt: found 59 matches with on-sky distance mean and scatter = 0.008 +- 0.006 arcsec
            

            Detector 60:

            w12: found 71 matches with on-sky distance mean and scatter = 0.071 +- 0.038 arcsec
            w16: found 71 matches with on-sky distance mean and scatter = 0.071 +- 0.038 arcsec
            tkt: found 64 matches with on-sky distance mean and scatter = 0.012 +- 0.006 arcsec
            

            I'm reluctant to hold up this ticket any longer, as I can't see anything that may be a cause for concern in the code changes in the various PRs. With that said, I'm keen to hear the thoughts of others as to whether you think the above results should be looked into more closely on this ticket, or punted to a future ticket instead if needs be?

            Show
            lskelvin Lee Kelvin added a comment - Thanks for rebasing John. I've updated my test script above, reran everything on w16 and w16+ DM-27013 , and I update my sky source results below: The number of detect_isPrimary ('good') sources over the total number of sources in the sourceTable_visit: w12 good/total = 60412 / 71319 w13 good/total = 60412 / 71319 w14 good/total = 60313 / 71202 w15 good/total = 60313 / 71202 w16 good/total = 60313 / 71202 tkt good/total = 60412 / 71319 and the median sky source flux: w12 median sky flux = 0.04073 w13 median sky flux = 3.12717 w14 median sky flux = 0.14926 w15 median sky flux = 0.14926 w16 median sky flux = 0.14926 tkt median sky flux = 0.04073 With these results in mind, it seems as if the metrics above are back in sync with their prior w12 values. I attach my step 1 data processing logs for w12, w16 and the ticket branch reduction to this ticket, for reference. In response to your astrometry question Lauren MacArthur , mean/scatter values are significantly lower on this ticket branch, e.g., for detector 25 in my example visit: w12: found 89 matches with on-sky distance mean and scatter = 0.051 +- 0.025 arcsec w16: found 89 matches with on-sky distance mean and scatter = 0.051 +- 0.025 arcsec tkt: found 85 matches with on-sky distance mean and scatter = 0.010 +- 0.006 arcsec Detector 40: w12: found 69 matches with on-sky distance mean and scatter = 0.060 +- 0.037 arcsec w16: found 70 matches with on-sky distance mean and scatter = 0.059 +- 0.038 arcsec tkt: found 59 matches with on-sky distance mean and scatter = 0.008 +- 0.006 arcsec Detector 60: w12: found 71 matches with on-sky distance mean and scatter = 0.071 +- 0.038 arcsec w16: found 71 matches with on-sky distance mean and scatter = 0.071 +- 0.038 arcsec tkt: found 64 matches with on-sky distance mean and scatter = 0.012 +- 0.006 arcsec I'm reluctant to hold up this ticket any longer, as I can't see anything that may be a cause for concern in the code changes in the various PRs. With that said, I'm keen to hear the thoughts of others as to whether you think the above results should be looked into more closely on this ticket, or punted to a future ticket instead if needs be?
            Hide
            lauren Lauren MacArthur added a comment -

            Thanks for the detailed report, Lee!  I was curious if we’d see significantly fewer matches due to the lower density of Gaia, but clearly this is not an issue for this region.  The lower mean & scatter does seem significant (in the right direction)!

            Show
            lauren Lauren MacArthur added a comment - Thanks for the detailed report, Lee!  I was curious if we’d see significantly fewer matches due to the lower density of Gaia, but clearly this is not an issue for this region.  The lower mean & scatter does seem significant (in the right direction)!
            Hide
            lskelvin Lee Kelvin added a comment -

            Ok, thanks both. Final comment - as HSC-specific changes are not taking place on this ticket, it might be best to set up an HSC-specific ticket now and leave an inline-comment somewhere in obs_subaru pointing to both this ticket and the future HSC-ticket, for reference. Otherwise, I worry that these changes made elsewhere in the stack will be forgotten for HSC, and may not ultimately make their way there.

            With all this in mind however, I think this looks good to merge to me. If any issues surrounding the sky source metric changes from w12 to w16 need to be further investigated, I suggest that can take place on a separate ticket. Thanks John.

            Show
            lskelvin Lee Kelvin added a comment - Ok, thanks both. Final comment - as HSC-specific changes are not taking place on this ticket, it might be best to set up an HSC-specific ticket now and leave an inline-comment somewhere in obs_subaru pointing to both this ticket and the future HSC-ticket, for reference. Otherwise, I worry that these changes made elsewhere in the stack will be forgotten for HSC, and may not ultimately make their way there. With all this in mind however, I think this looks good to merge to me. If any issues surrounding the sky source metric changes from w12 to w16 need to be further investigated, I suggest that can take place on a separate ticket. Thanks John.
            Hide
            Parejkoj John Parejko added a comment -

            DM-27858 is the relevant ticket for validating the HSC changes. I don't know who is going to be responsible for that, but I'll bring it up on slack.

            Show
            Parejkoj John Parejko added a comment - DM-27858 is the relevant ticket for validating the HSC changes. I don't know who is going to be responsible for that, but I'll bring it up on slack.
            Hide
            Parejkoj John Parejko added a comment -

            Thank you Lee Kelvin for investigating this ticket on DECam. I've merged all the branches, but closed the obs_subaru PR without merging it.

            Show
            Parejkoj John Parejko added a comment - Thank you Lee Kelvin for investigating this ticket on DECam. I've merged all the branches, but closed the obs_subaru PR without merging it.
            Hide
            Parejkoj John Parejko added a comment -

            I neglected to re-run Jenkins after rebasing everything, and my merge broke the ap_verify package. I've put a fix for that on tickets/DM-27013-fix, but there is apparently a separate breakage when running ap_verify; I filed DM-34491 for that one.

            Show
            Parejkoj John Parejko added a comment - I neglected to re-run Jenkins after rebasing everything, and my merge broke the ap_verify package. I've put a fix for that on tickets/ DM-27013 -fix , but there is apparently a separate breakage when running ap_verify; I filed DM-34491 for that one.
            Hide
            lskelvin Lee Kelvin added a comment -

            Thanks for the update John. For completeness, the fix PR is at this link.

            Show
            lskelvin Lee Kelvin added a comment - Thanks for the update John. For completeness, the fix PR is at this link .
            Hide
            Parejkoj John Parejko added a comment -

            And one final PR to fix a pipelines_check breakage that was masked by the unmerged obs_subaru branch: https://github.com/lsst/obs_subaru/pull/416

            Show
            Parejkoj John Parejko added a comment - And one final PR to fix a pipelines_check breakage that was masked by the unmerged obs_subaru branch: https://github.com/lsst/obs_subaru/pull/416

              People

              Assignee:
              Parejkoj John Parejko
              Reporter:
              Parejkoj John Parejko
              Reviewers:
              Lee Kelvin
              Watchers:
              Ian Sullivan, John Parejko, Krzysztof Findeisen, Lauren MacArthur, Lee Kelvin, Meredith Rawls, Yusra AlSayyad
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.