Need periodic log messages for forcedPhotCoadd during aperture corrections

XMLWordPrintable

Details

• Type: Story
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
• Story Points:
3
• Team:
Data Release Production
• Urgent?:
No

Description

The PanDA batch system needs tasks to write periodic log messages at least every 2 hours, to ensure that the task is still running and not in some stuck state.  forcedPhotCoadd already does write VERBOSE log messages every 10 minutes, but stops doing this once it starts applying aperture corrections.  Because the aperture corrections can take several hours, the lack of log messages can lead to PanDA killing the job, requiring additional re-attempts that may eventually fail again for the same reason.

So additional periodic logging, at VERBOSE level, are needed during aperture correction and any subsequent steps.

Activity

Hide
Colin Slater added a comment -

The looping_limit_default timeout in panda has been raised from 2 hours to 20 hours for the moment, so we are able to run successfully, but once this fix and DM-33820 are merged we should reset the limit back to its normal 2 hours.

Show
Colin Slater added a comment - The looping_limit_default timeout in panda has been raised from 2 hours to 20 hours for the moment, so we are able to run successfully, but once this fix and DM-33820 are merged we should reset the limit back to its normal 2 hours.
Hide

The fix is ready to be reviewed. The ci_hsc datasets are too small to trigger the heartbeat logs.

Show
Arun Kannawadi added a comment - The fix is ready to be reviewed. The ci_hsc datasets are too small to trigger the heartbeat logs. Jenkins build:  https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/36014/artifacts
Hide

Thanks for the review, Tim

Show
Hide

Merged into the main branch. Waiting for backport-approved tag from DM-CCB.

Show
Arun Kannawadi added a comment - Merged into the main branch. Waiting for backport-approved tag from DM-CCB.
Hide
Tim Jenness added a comment -

Backport is approved.

Show
Tim Jenness added a comment - Backport is approved.
Hide
Show

People

Assignee:
Reporter:
Huan Lin
Reviewers:
Tim Jenness
Watchers:
Arun Kannawadi, Colin Slater, Huan Lin, Shuwei Ye, Tim Jenness, Yusra AlSayyad