# ap_verify gen3 fails to find jointcal_photoCalib dataset in graph generation

XMLWordPrintable

#### Details

• Type: Bug
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
None
• Story Points:
1
• Team:
Data Release Production
• Urgent?:
No

#### Description

ap_verify gen3 failed last night with the following error:

 RuntimeError: 2 dataset(s) of type 'jointcal_photoCalib' was/were present in a previous query, but could not be found now.This is either a logic bug in QuantumGraph generation or the input collections have been modified since QuantumGraph generation began. 

It seems likely that this was caused by DM-31491, which just landed and at a minimum reordered the dimensions for this dataset. https://github.com/lsst/pipe_tasks/commit/e5427992456a875016af14c907ea1f8bd786fb6f#diff-d0c181586a7ef6ce433ff7baee9b2fa03ab7de10015e6cafa57741a90a424d2dL77-R94

I believe that there is also a typo later in the same PR: https://github.com/lsst/pipe_tasks/commit/e5427992456a875016af14c907ea1f8bd786fb6f#diff-d0c181586a7ef6ce433ff7baee9b2fa03ab7de10015e6cafa57741a90a424d2dR369

Fixing this before the weekly would be highly desirable.

#### Activity

Hide
Sophie Reed added a comment - - edited

I found the bug and have tested ap_verify. I pulled ap_verify and ap_verify_ci_cosmos_pdr2 and then ran:

ap_verify.py --dataset ap_verify_ci_cosmos_pdr2 --gen3 --output /project/sreed/apVerifyPdr2/ -p pipelines/ApVerifyWithFakes.yaml

from the ap_verify_ci_cosmos_pdr2 repo.

I also have a jenkins run going: https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/35271/pipeline

I ran:

pipetask run -b /repo/main/butler.yaml -i HSC/runs/RC2/w_2021_42/DM-32248 -p /home/sr525/repos/obs_subaru/pipelines/DRPFakes.yaml -d "instrument='HSC' and skymap='hsc_rings_v1' and (band='i') and (detector=82 or detector=81 or detector=80) and (visit=1228 or visit=1230)" -i u/sr525/ssiCats -o u/sr525/fakesTest6 --register-dataset-types

which completed.

These were all run with:

setup lsst_distrib -t d_2021_10_25

Show
Sophie Reed added a comment - - edited I found the bug and have tested ap_verify. I pulled ap_verify and ap_verify_ci_cosmos_pdr2 and then ran: ap_verify.py --dataset ap_verify_ci_cosmos_pdr2 --gen3 --output /project/sreed/apVerifyPdr2/ -p pipelines/ApVerifyWithFakes.yaml from the ap_verify_ci_cosmos_pdr2 repo. I also have a jenkins run going: https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/35271/pipeline I ran:  pipetask run -b /repo/main/butler.yaml -i HSC/runs/RC2/w_2021_42/ DM-32248 -p /home/sr525/repos/obs_subaru/pipelines/DRPFakes.yaml -d "instrument='HSC' and skymap='hsc_rings_v1' and (band='i') and (detector=82 or detector=81 or detector=80) and (visit=1228 or visit=1230)" -i u/sr525/ssiCats -o u/sr525/fakesTest6 --register-dataset-types which completed. These were all run with:  source /project/yusra/lsst_stack/loadLSST.bash setup lsst_distrib -t d_2021_10_25
Hide
Lee Kelvin added a comment -

Thanks Sophie, and thanks Chris for your help too. This ticket removes unnecessary wcs and photoCalib code from processCcdWithFakes.py, and corrects a problematic typo that crops up further down the line. I can confirm that I'm able to successfully run both ap_verify and DRPFakes with this ticket.

For testing, I first set up:

 setup lsst_distrib -t w_2021_43 setup -j -r ap_verify setup -j -r ap_verify_ci_cosmos_pdr2 setup -j -r pipe_tasks setup -j -r obs_subaru setup -j -r ci_hsc_gen3 

(all on their master branches). I believe these are all the repos which were touched since DM-31491 dropped, plus the required ap_verify repos.

Next, I confirm that running on 'master':

 ap_verify.py -j 8 \ --dataset ap_verify_ci_cosmos_pdr2 \ --output /project/lskelvin/apverify/run/ \ --gen3 \ --data-query "visit in (59150, 59160) and band='g'" \ -p $AP_VERIFY_CI_COSMOS_PDR2_DIR/pipelines/ApVerifyWithFakes.yaml \ 2>&1 | tee log-ApVerifyWithFakes.log  fails with the above joincal error message. Switching pipe_tasks from 'master' to 'tickets/DM-32376' and then re-running now allows for ap_verify to complete successfully, with outputs as expected. I also ran on this ticket branch:  pipetask --long-log run --register-dataset-types \ -b /repo/main \ -i HSC/runs/RC2/w_2021_42/DM-32248,u/sr525/ssiCats \ -o u/lskelvin/ap_verify_fakes_test_ticket \ -p$OBS_SUBARU_DIR/pipelines/DRPFakes.yaml \ -d "instrument='HSC' AND skymap='hsc_rings_v1' AND band='i' AND detector=1 AND visit=1228" \ 2>&1 | tee log-DRPFakes-ticket.log 

which also appears to complete successfully, producing outputs as expected.

Assuming Jenkins doesn't produce any red flags, then I think this is good to merge. Thanks!

Show
Lee Kelvin added a comment - Thanks Sophie, and thanks Chris for your help too. This ticket removes unnecessary wcs and photoCalib code from processCcdWithFakes.py , and corrects a problematic typo that crops up further down the line. I can confirm that I'm able to successfully run both ap_verify and DRPFakes with this ticket. For testing, I first set up: setup lsst_distrib -t w_2021_43 setup -j -r ap_verify setup -j -r ap_verify_ci_cosmos_pdr2 setup -j -r pipe_tasks setup -j -r obs_subaru setup -j -r ci_hsc_gen3 (all on their master branches). I believe these are all the repos which were touched since DM-31491 dropped, plus the required ap_verify repos. Next, I confirm that running on 'master': ap_verify.py -j 8 \ --dataset ap_verify_ci_cosmos_pdr2 \ --output /project/lskelvin/apverify/run/ \ --gen3 \ --data-query "visit in (59150, 59160) and band='g'" \ -p $AP_VERIFY_CI_COSMOS_PDR2_DIR/pipelines/ApVerifyWithFakes.yaml \ 2>&1 | tee log-ApVerifyWithFakes.log fails with the above joincal error message. Switching pipe_tasks from 'master' to 'tickets/ DM-32376 ' and then re-running now allows for ap_verify to complete successfully, with outputs as expected. I also ran on this ticket branch: pipetask --long-log run --register-dataset-types \ -b /repo/main \ -i HSC/runs/RC2/w_2021_42/DM-32248,u/sr525/ssiCats \ -o u/lskelvin/ap_verify_fakes_test_ticket \ -p$OBS_SUBARU_DIR/pipelines/DRPFakes.yaml \ -d "instrument='HSC' AND skymap='hsc_rings_v1' AND band='i' AND detector=1 AND visit=1228" \ 2>&1 | tee log-DRPFakes-ticket.log which also appears to complete successfully, producing outputs as expected. Assuming Jenkins doesn't produce any red flags, then I think this is good to merge. Thanks!
Hide

I see a backport was requested from Tim. This ticket is a bugfix for DM-31491 which was merged the night before and broke ap_verify. Backporting this without DM-31491 does not make sense.

Show
Yusra AlSayyad added a comment - I see a backport was requested from Tim. This ticket is a bugfix for DM-31491 which was merged the night before and broke ap_verify. Backporting this without DM-31491 does not make sense.
Hide
Tim Jenness added a comment -

I only added the request when we thought that ci_hsc_gen3 was broken and this branch had the fix in it. Now that we know that ci_hsc_gen3 had the tag in the wrong place it is no longer important.

Show
Tim Jenness added a comment - I only added the request when we thought that ci_hsc_gen3 was broken and this branch had the fix in it. Now that we know that ci_hsc_gen3 had the tag in the wrong place it is no longer important.
Hide
Nate Lust added a comment - - edited

Tim is correct that ci_hsc_gen3 is now able to run after the pin move, however a problem still remains. If someone uses v23 obs_subaru fakes pipeline in the way that ci_hsc_gen3 (on the newer ref) was trying to,  the system will fail. Moving the pin on ci_hsc_gen3 back only works because it reverts it to a state where it is not using the obs_subaru fakes pipeline that exists in v23, but was instead using it's own fakes pipeline.

There is some confusion if this was the right ticket because of a past rebase issue, but that does not matter so much as getting the change set onto v23.

Show
Nate Lust added a comment - - edited Tim is correct that ci_hsc_gen3 is now able to run after the pin move, however a problem still remains. If someone uses v23 obs_subaru fakes pipeline in the way that ci_hsc_gen3 (on the newer ref) was trying to,  the system will fail. Moving the pin on ci_hsc_gen3 back only works because it reverts it to a state where it is not using the obs_subaru fakes pipeline that exists in v23, but was instead using it's own fakes pipeline.   Specifically we need this line from this change on pipe_tasks: https://github.com/lsst/pipe_tasks/commit/80aa66dc92c83c1e284b116ae07675d6559376d3#diff-7daf2a8d292dc80b5c277b573b18346f30955635071479a011e78be5e76531b3R315 There is some confusion if this was the right ticket because of a past rebase issue, but that does not matter so much as getting the change set onto v23.

#### People

Assignee:
Sophie Reed
Reporter:
Kian-Tat Lim
Reviewers:
Lee Kelvin
Watchers:
Eli Rykoff, Jim Bosch, Kian-Tat Lim, Lee Kelvin, Nate Lust, Sophie Reed, Tim Jenness, Yusra AlSayyad