Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-32376

ap_verify gen3 fails to find jointcal_photoCalib dataset in graph generation

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: pipe_tasks
    • Labels:
      None
    • Story Points:
      1
    • Epic Link:
    • Team:
      Data Release Production
    • Urgent?:
      No

      Description

      ap_verify gen3 failed last night with the following error:

      RuntimeError: 2 dataset(s) of type 'jointcal_photoCalib' was/were present in a previous query, but could not be found now.This is either a logic bug in QuantumGraph generation or the input collections have been modified since QuantumGraph generation began.
      

      It seems likely that this was caused by DM-31491, which just landed and at a minimum reordered the dimensions for this dataset. https://github.com/lsst/pipe_tasks/commit/e5427992456a875016af14c907ea1f8bd786fb6f#diff-d0c181586a7ef6ce433ff7baee9b2fa03ab7de10015e6cafa57741a90a424d2dL77-R94

      I believe that there is also a typo later in the same PR: https://github.com/lsst/pipe_tasks/commit/e5427992456a875016af14c907ea1f8bd786fb6f#diff-d0c181586a7ef6ce433ff7baee9b2fa03ab7de10015e6cafa57741a90a424d2dR369

      Fixing this before the weekly would be highly desirable.

        Attachments

          Issue Links

            Activity

            Hide
            sophiereed Sophie Reed added a comment - - edited

            I found the bug and have tested ap_verify. I pulled ap_verify and ap_verify_ci_cosmos_pdr2 and then ran:

            ap_verify.py --dataset ap_verify_ci_cosmos_pdr2 --gen3 --output /project/sreed/apVerifyPdr2/ -p pipelines/ApVerifyWithFakes.yaml

            from the ap_verify_ci_cosmos_pdr2 repo.

            I also have a jenkins run going: https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/35271/pipeline

            I ran: 

            pipetask run -b /repo/main/butler.yaml -i HSC/runs/RC2/w_2021_42/DM-32248 -p /home/sr525/repos/obs_subaru/pipelines/DRPFakes.yaml -d "instrument='HSC' and skymap='hsc_rings_v1' and (band='i') and (detector=82 or detector=81 or detector=80) and (visit=1228 or visit=1230)" -i u/sr525/ssiCats -o u/sr525/fakesTest6 --register-dataset-types

            which completed.

            These were all run with: 

            source /project/yusra/lsst_stack/loadLSST.bash
            setup lsst_distrib -t d_2021_10_25

            Show
            sophiereed Sophie Reed added a comment - - edited I found the bug and have tested ap_verify. I pulled ap_verify and ap_verify_ci_cosmos_pdr2 and then ran: ap_verify.py --dataset ap_verify_ci_cosmos_pdr2 --gen3 --output /project/sreed/apVerifyPdr2/ -p pipelines/ApVerifyWithFakes.yaml from the ap_verify_ci_cosmos_pdr2 repo. I also have a jenkins run going: https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/35271/pipeline I ran:  pipetask run -b /repo/main/butler.yaml -i HSC/runs/RC2/w_2021_42/ DM-32248 -p /home/sr525/repos/obs_subaru/pipelines/DRPFakes.yaml -d "instrument='HSC' and skymap='hsc_rings_v1' and (band='i') and (detector=82 or detector=81 or detector=80) and (visit=1228 or visit=1230)" -i u/sr525/ssiCats -o u/sr525/fakesTest6 --register-dataset-types which completed. These were all run with:  source /project/yusra/lsst_stack/loadLSST.bash setup lsst_distrib -t d_2021_10_25
            Hide
            lskelvin Lee Kelvin added a comment -

            Thanks Sophie, and thanks Chris for your help too. This ticket removes unnecessary wcs and photoCalib code from processCcdWithFakes.py, and corrects a problematic typo that crops up further down the line. I can confirm that I'm able to successfully run both ap_verify and DRPFakes with this ticket.

            For testing, I first set up:

            setup lsst_distrib -t w_2021_43
            setup -j -r ap_verify
            setup -j -r ap_verify_ci_cosmos_pdr2
            setup -j -r pipe_tasks
            setup -j -r obs_subaru
            setup -j -r ci_hsc_gen3
            

            (all on their master branches). I believe these are all the repos which were touched since DM-31491 dropped, plus the required ap_verify repos.

            Next, I confirm that running on 'master':

            ap_verify.py -j 8 \
            --dataset ap_verify_ci_cosmos_pdr2 \
            --output /project/lskelvin/apverify/run/ \
            --gen3 \
            --data-query "visit in (59150, 59160) and band='g'" \
            -p $AP_VERIFY_CI_COSMOS_PDR2_DIR/pipelines/ApVerifyWithFakes.yaml \
            2>&1 | tee log-ApVerifyWithFakes.log
            

            fails with the above joincal error message.

            Switching pipe_tasks from 'master' to 'tickets/DM-32376' and then re-running now allows for ap_verify to complete successfully, with outputs as expected.

            I also ran on this ticket branch:

            pipetask --long-log run --register-dataset-types \
            -b /repo/main \
            -i HSC/runs/RC2/w_2021_42/DM-32248,u/sr525/ssiCats \
            -o u/lskelvin/ap_verify_fakes_test_ticket \
            -p $OBS_SUBARU_DIR/pipelines/DRPFakes.yaml \
            -d "instrument='HSC' AND skymap='hsc_rings_v1' AND band='i' AND detector=1 AND visit=1228" \
            2>&1 | tee log-DRPFakes-ticket.log
            

            which also appears to complete successfully, producing outputs as expected.

            Assuming Jenkins doesn't produce any red flags, then I think this is good to merge. Thanks!

            Show
            lskelvin Lee Kelvin added a comment - Thanks Sophie, and thanks Chris for your help too. This ticket removes unnecessary wcs and photoCalib code from processCcdWithFakes.py , and corrects a problematic typo that crops up further down the line. I can confirm that I'm able to successfully run both ap_verify and DRPFakes with this ticket. For testing, I first set up: setup lsst_distrib -t w_2021_43 setup -j -r ap_verify setup -j -r ap_verify_ci_cosmos_pdr2 setup -j -r pipe_tasks setup -j -r obs_subaru setup -j -r ci_hsc_gen3 (all on their master branches). I believe these are all the repos which were touched since DM-31491 dropped, plus the required ap_verify repos. Next, I confirm that running on 'master': ap_verify.py -j 8 \ --dataset ap_verify_ci_cosmos_pdr2 \ --output /project/lskelvin/apverify/run/ \ --gen3 \ --data-query "visit in (59150, 59160) and band='g'" \ -p $AP_VERIFY_CI_COSMOS_PDR2_DIR/pipelines/ApVerifyWithFakes.yaml \ 2>&1 | tee log-ApVerifyWithFakes.log fails with the above joincal error message. Switching pipe_tasks from 'master' to 'tickets/ DM-32376 ' and then re-running now allows for ap_verify to complete successfully, with outputs as expected. I also ran on this ticket branch: pipetask --long-log run --register-dataset-types \ -b /repo/main \ -i HSC/runs/RC2/w_2021_42/DM-32248,u/sr525/ssiCats \ -o u/lskelvin/ap_verify_fakes_test_ticket \ -p $OBS_SUBARU_DIR/pipelines/DRPFakes.yaml \ -d "instrument='HSC' AND skymap='hsc_rings_v1' AND band='i' AND detector=1 AND visit=1228" \ 2>&1 | tee log-DRPFakes-ticket.log which also appears to complete successfully, producing outputs as expected. Assuming Jenkins doesn't produce any red flags, then I think this is good to merge. Thanks!
            Hide
            yusra Yusra AlSayyad added a comment -

            I see a backport was requested from Tim. This ticket is a bugfix for DM-31491 which was merged the night before and broke ap_verify. Backporting this without DM-31491 does not make sense.

            Show
            yusra Yusra AlSayyad added a comment - I see a backport was requested from Tim. This ticket is a bugfix for DM-31491 which was merged the night before and broke ap_verify. Backporting this without DM-31491 does not make sense.
            Hide
            tjenness Tim Jenness added a comment -

            I only added the request when we thought that ci_hsc_gen3 was broken and this branch had the fix in it. Now that we know that ci_hsc_gen3 had the tag in the wrong place it is no longer important.

            Show
            tjenness Tim Jenness added a comment - I only added the request when we thought that ci_hsc_gen3 was broken and this branch had the fix in it. Now that we know that ci_hsc_gen3 had the tag in the wrong place it is no longer important.
            Hide
            nlust Nate Lust added a comment - - edited

            Tim is correct that ci_hsc_gen3 is now able to run after the pin move, however a problem still remains. If someone uses v23 obs_subaru fakes pipeline in the way that ci_hsc_gen3 (on the newer ref) was trying to,  the system will fail. Moving the pin on ci_hsc_gen3 back only works because it reverts it to a state where it is not using the obs_subaru fakes pipeline that exists in v23, but was instead using it's own fakes pipeline.

             

            Specifically we need this line from this change on pipe_tasks: https://github.com/lsst/pipe_tasks/commit/80aa66dc92c83c1e284b116ae07675d6559376d3#diff-7daf2a8d292dc80b5c277b573b18346f30955635071479a011e78be5e76531b3R315

            There is some confusion if this was the right ticket because of a past rebase issue, but that does not matter so much as getting the change set onto v23.

            Show
            nlust Nate Lust added a comment - - edited Tim is correct that ci_hsc_gen3 is now able to run after the pin move, however a problem still remains. If someone uses v23 obs_subaru fakes pipeline in the way that ci_hsc_gen3 (on the newer ref) was trying to,  the system will fail. Moving the pin on ci_hsc_gen3 back only works because it reverts it to a state where it is not using the obs_subaru fakes pipeline that exists in v23, but was instead using it's own fakes pipeline.   Specifically we need this line from this change on pipe_tasks: https://github.com/lsst/pipe_tasks/commit/80aa66dc92c83c1e284b116ae07675d6559376d3#diff-7daf2a8d292dc80b5c277b573b18346f30955635071479a011e78be5e76531b3R315 There is some confusion if this was the right ticket because of a past rebase issue, but that does not matter so much as getting the change set onto v23.

              People

              Assignee:
              sophiereed Sophie Reed
              Reporter:
              ktl Kian-Tat Lim
              Reviewers:
              Lee Kelvin
              Watchers:
              Eli Rykoff, Jim Bosch, Kian-Tat Lim, Lee Kelvin, Nate Lust, Sophie Reed, Tim Jenness, Yusra AlSayyad
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.