Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-31528

Add more log messages to the measure task

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Story Points:
      2
    • Team:
      Data Release Production
    • Urgent?:
      No

      Description

      measure ( MeasureMergedCoaddSourcesTask) jobs can run for a long time without outputting any log messages. For example, see /scratch/brendal4/bps-gen3-dc2/submit/2.2i/runs/test-med-1/w_2021_32/DM-31348/20210809T172956Z/jobs/measure/3828/18/y/13500_measure_3828_18_y.3588136.err where the 3rd log record came in more than 1hr after the 2nd log record. In some other cases it can take >2 hr or even longer. This is causing problems for using PanDA on IDF, because the lack of log activities is interpreted as the job has hung and PanDA pilot timed out.

      Even though we might tune PanDA for longer timeout, it'd be good to have more log messages in running this task, so one can check the status of the run and so on.

      Please add more log messages to this task. Either INFO- or VERBOSE-level logs are fine as the plan is to run these jobs with the VERBOSE-level logging.

        Attachments

          Issue Links

            Activity

            Hide
            kannawad Arun Kannawadi added a comment -

            Jenkins run: https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/34846/pipeline

            Tim Jenness - as with other log-related tickets, could I assign you as the reviewer for this ticket as well?

            Show
            kannawad Arun Kannawadi added a comment - Jenkins run: https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/34846/pipeline Tim Jenness  - as with other log-related tickets, could I assign you as the reviewer for this ticket as well?
            Hide
            tjenness Tim Jenness added a comment -

            That output looks good. Yes please make pull requests and I can review.

            Show
            tjenness Tim Jenness added a comment - That output looks good. Yes please make pull requests and I can review.
            Hide
            kannawad Arun Kannawadi added a comment -

            Thanks for agreeing to review this, Tim. Jenkins doesn't include ci_imsim or ci_hsc since the new log messages will not appear, with the default interval of 10 minutes. I could either change the default to a couple of seconds to test with the CI datasets and revert it to 10 minutes, or just consider my log outputs above as a sanity check.

            Show
            kannawad Arun Kannawadi added a comment - Thanks for agreeing to review this, Tim. Jenkins doesn't include ci_imsim or ci_hsc since the new log messages will not appear, with the default interval of 10 minutes. I could either change the default to a couple of seconds to test with the CI datasets and revert it to 10 minutes, or just consider my log outputs above as a sanity check.
            Hide
            tjenness Tim Jenness added a comment -

            Looks good. Thanks. One minor comment about when to add the 600 seconds.

            Show
            tjenness Tim Jenness added a comment - Looks good. Thanks. One minor comment about when to add the 600 seconds.
            Hide
            fred3m Fred Moolekamp added a comment -

            I'll add a quick comment re deblending with scarlet. As far as I know it has never hung on ground based data, so that is something to keep in mind. If anyone has found otherwise please let me know. I mention this here because as I noted in github, the fix in this ticket will only work when a patch takes a long time due to multiple blends that as a collective take longer than 600 s. But there are some patches where nearly the entire patch is a blend, meaning it will still appear to hang. So I opened https://github.com/pmelchior/scarlet/issues/252 in scarlet to implement a similar fix on the scarlet side.

            Show
            fred3m Fred Moolekamp added a comment - I'll add a quick comment re deblending with scarlet. As far as I know it has never hung on ground based data, so that is something to keep in mind. If anyone has found otherwise please let me know. I mention this here because as I noted in github, the fix in this ticket will only work when a patch takes a long time due to multiple blends that as a collective take longer than 600 s. But there are some patches where nearly the entire patch is a blend, meaning it will still appear to hang. So I opened https://github.com/pmelchior/scarlet/issues/252 in scarlet to implement a similar fix on the scarlet side.

              People

              Assignee:
              kannawad Arun Kannawadi
              Reporter:
              hchiang2 Hsin-Fang Chiang
              Reviewers:
              Tim Jenness
              Watchers:
              Arun Kannawadi, Fred Moolekamp, Hsin-Fang Chiang, Huan Lin, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.