Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-33854

Need periodic log messages for forcedPhotCoadd during aperture corrections

    XMLWordPrintable

    Details

    • Story Points:
      3
    • Team:
      Data Release Production
    • Urgent?:
      No

      Description

      The PanDA batch system needs tasks to write periodic log messages at least every 2 hours, to ensure that the task is still running and not in some stuck state.  forcedPhotCoadd already does write VERBOSE log messages every 10 minutes, but stops doing this once it starts applying aperture corrections.  Because the aperture corrections can take several hours, the lack of log messages can lead to PanDA killing the job, requiring additional re-attempts that may eventually fail again for the same reason.

      So additional periodic logging, at VERBOSE level, are needed during aperture correction and any subsequent steps.

        Attachments

          Issue Links

            Activity

            No builds found.
            hlin Huan Lin created issue -
            tjenness Tim Jenness made changes -
            Field Original Value New Value
            Link This issue relates to DM-33820 [ DM-33820 ]
            tjenness Tim Jenness made changes -
            Component/s pipe_tasks [ 10726 ]
            Component/s Science Pipelines [ 10706 ]
            tjenness Tim Jenness made changes -
            Component/s meas_base [ 10750 ]
            Component/s pipe_tasks [ 10726 ]
            tjenness Tim Jenness made changes -
            Link This issue relates to DM-31528 [ DM-31528 ]
            ctslater Colin Slater made changes -
            Link This issue relates to DM-33858 [ DM-33858 ]
            yusra Yusra AlSayyad made changes -
            Labels SciencePipelines SciencePipelines backport-v23
            kannawad Arun Kannawadi made changes -
            Assignee Arun Kannawadi [ kannawad ]
            Hide
            ctslater Colin Slater added a comment -

            The looping_limit_default timeout in panda has been raised from 2 hours to 20 hours for the moment, so we are able to run successfully, but once this fix and DM-33820 are merged we should reset the limit back to its normal 2 hours.

            Show
            ctslater Colin Slater added a comment - The looping_limit_default timeout in panda has been raised from 2 hours to 20 hours for the moment, so we are able to run successfully, but once this fix and DM-33820 are merged we should reset the limit back to its normal 2 hours.
            kannawad Arun Kannawadi made changes -
            Status To Do [ 10001 ] In Progress [ 3 ]
            Hide
            kannawad Arun Kannawadi added a comment -

            The fix is ready to be reviewed. The ci_hsc datasets are too small to trigger the heartbeat logs.

            Jenkins build: https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/36014/artifacts

            Show
            kannawad Arun Kannawadi added a comment - The fix is ready to be reviewed. The ci_hsc datasets are too small to trigger the heartbeat logs. Jenkins build:  https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/36014/artifacts
            Hide
            kannawad Arun Kannawadi added a comment -

            Thanks for the review, Tim

            Show
            kannawad Arun Kannawadi added a comment - Thanks for the review, Tim
            kannawad Arun Kannawadi made changes -
            Reviewers Tim Jenness [ tjenness ]
            Status In Progress [ 3 ] In Review [ 10004 ]
            tjenness Tim Jenness made changes -
            Status In Review [ 10004 ] Reviewed [ 10101 ]
            Hide
            kannawad Arun Kannawadi added a comment -

            Merged into the `main` branch. Waiting for `backport-approved` tag from DM-CCB.

            Show
            kannawad Arun Kannawadi added a comment - Merged into the `main` branch. Waiting for `backport-approved` tag from DM-CCB.
            kannawad Arun Kannawadi made changes -
            Resolution Done [ 10000 ]
            Status Reviewed [ 10101 ] Done [ 10002 ]
            tjenness Tim Jenness made changes -
            Labels SciencePipelines backport-v23 SciencePipelines backport-approved backport-v23
            Hide
            tjenness Tim Jenness added a comment -

            Backport is approved.

            Show
            tjenness Tim Jenness added a comment - Backport is approved.
            tjenness Tim Jenness made changes -
            Link This issue relates to DM-33919 [ DM-33919 ]
            Show
            kannawad Arun Kannawadi added a comment - Successful Jenkins run:  https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/36041/pipeline
            kannawad Arun Kannawadi made changes -
            Labels SciencePipelines backport-approved backport-v23 SciencePipelines backport-approved backport-done backport-v23
            yusra Yusra AlSayyad made changes -
            Epic Link PREOPS-1161 [ 1475339 ]
            Story Points 3
            Team Data Release Production [ 10301 ]

              People

              Assignee:
              kannawad Arun Kannawadi
              Reporter:
              hlin Huan Lin
              Reviewers:
              Tim Jenness
              Watchers:
              Arun Kannawadi, Colin Slater, Huan Lin, Shuwei Ye, Tim Jenness, Yusra AlSayyad
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.