Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-31528

Add more log messages to the measure task

    XMLWordPrintable

Details

    • Story
    • Status: Done
    • Resolution: Done
    • None
    • None
    • None
    • 2
    • Data Release Production
    • No

    Description

      measure ( MeasureMergedCoaddSourcesTask) jobs can run for a long time without outputting any log messages. For example, see /scratch/brendal4/bps-gen3-dc2/submit/2.2i/runs/test-med-1/w_2021_32/DM-31348/20210809T172956Z/jobs/measure/3828/18/y/13500_measure_3828_18_y.3588136.err where the 3rd log record came in more than 1hr after the 2nd log record. In some other cases it can take >2 hr or even longer. This is causing problems for using PanDA on IDF, because the lack of log activities is interpreted as the job has hung and PanDA pilot timed out.

      Even though we might tune PanDA for longer timeout, it'd be good to have more log messages in running this task, so one can check the status of the run and so on.

      Please add more log messages to this task. Either INFO- or VERBOSE-level logs are fine as the plan is to run these jobs with the VERBOSE-level logging.

      Attachments

        Issue Links

          Activity

            Jenkins run: https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/34846/pipeline

            tjenness - as with other log-related tickets, could I assign you as the reviewer for this ticket as well?

            kannawad Arun Kannawadi added a comment - Jenkins run: https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/34846/pipeline tjenness  - as with other log-related tickets, could I assign you as the reviewer for this ticket as well?
            tjenness Tim Jenness added a comment -

            That output looks good. Yes please make pull requests and I can review.

            tjenness Tim Jenness added a comment - That output looks good. Yes please make pull requests and I can review.

            Thanks for agreeing to review this, Tim. Jenkins doesn't include ci_imsim or ci_hsc since the new log messages will not appear, with the default interval of 10 minutes. I could either change the default to a couple of seconds to test with the CI datasets and revert it to 10 minutes, or just consider my log outputs above as a sanity check.

            kannawad Arun Kannawadi added a comment - Thanks for agreeing to review this, Tim. Jenkins doesn't include ci_imsim or ci_hsc since the new log messages will not appear, with the default interval of 10 minutes. I could either change the default to a couple of seconds to test with the CI datasets and revert it to 10 minutes, or just consider my log outputs above as a sanity check.
            tjenness Tim Jenness added a comment -

            Looks good. Thanks. One minor comment about when to add the 600 seconds.

            tjenness Tim Jenness added a comment - Looks good. Thanks. One minor comment about when to add the 600 seconds.

            I'll add a quick comment re deblending with scarlet. As far as I know it has never hung on ground based data, so that is something to keep in mind. If anyone has found otherwise please let me know. I mention this here because as I noted in github, the fix in this ticket will only work when a patch takes a long time due to multiple blends that as a collective take longer than 600 s. But there are some patches where nearly the entire patch is a blend, meaning it will still appear to hang. So I opened https://github.com/pmelchior/scarlet/issues/252 in scarlet to implement a similar fix on the scarlet side.

            fred3m Fred Moolekamp added a comment - I'll add a quick comment re deblending with scarlet. As far as I know it has never hung on ground based data, so that is something to keep in mind. If anyone has found otherwise please let me know. I mention this here because as I noted in github, the fix in this ticket will only work when a patch takes a long time due to multiple blends that as a collective take longer than 600 s. But there are some patches where nearly the entire patch is a blend, meaning it will still appear to hang. So I opened https://github.com/pmelchior/scarlet/issues/252 in scarlet to implement a similar fix on the scarlet side.

            People

              kannawad Arun Kannawadi
              hchiang2 Hsin-Fang Chiang
              Tim Jenness
              Arun Kannawadi, Fred Moolekamp, Hsin-Fang Chiang, Huan Lin, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Jenkins

                  No builds found.