Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-33786

assembleCoadd reports success even when some stripes are unsuccessful.

    XMLWordPrintable

    Details

    • Story Points:
      2
    • Team:
      Data Release Production
    • Urgent?:
      No

      Description

      V&V took a look at the coadd plots from DP0.2 step3, and noticed that objects were missing from patches in a "banded" horizontal structures. After an exciting group debugging session, we determined that the root cause is several missing portions of the coadds. I've attached an example image from tract 2898, patch 26, i band, in collection "2.2i/runs/DP0.2/v23_0_1/PREOPS-905/step3_1". The bands at the bottom are the problem. Because they appear randomly across the patch in each band, they sometimes add up to a significant fraction of the patch being affected.

       

      The tail of the assembleCoadd log file is:

      INFO Detected 3 positive peaks in 3 footprints and 0 negative peaks in 0 footprints to 5 sigma
      INFO Assembling 143 deepCoadd_directWarp
      ERROR Downloading s3://butler-us-central1-panda-dev/dc2/2.2i/runs/DP0.2/v23_0_1/PREOPS-905/step3_1/20220211T205638Z/deepCoadd_directWarp/2898/26/20250918/i/i_sim_1.4/923915/deepCoadd_directWarp_LSSTCam-imSim_2898_26_i_i_sim_1_4_923915_DC2_2_2i_runs_DP0_2_v23_0_1_PREOPS-905_step3_1_20220211T205638Z.fits to local file: Took 5.0734 seconds
      CRITICAL Cannot compute coadd (minimum=(19900, 11900), maximum=(24099, 12099)): An error occurred (429) when calling the GetObject operation: Too Many Requests
      ERROR Downloading s3://butler-us-central1-panda-dev/dc2/2.2i/runs/DP0.2/v23_0_1/PREOPS-905/step3_1/20220211T205638Z/deepCoadd_directWarp/2898/26/20251123/i/i_sim_1.4/966849/deepCoadd_directWarp_LSSTCam-imSim_2898_26_i_i_sim_1_4_966849_DC2_2_2i_runs_DP0_2_v23_0_1_PREOPS-905_step3_1_20220211T205638Z.fits to local file: Took 0.9210 seconds
      CRITICAL Cannot compute coadd (minimum=(19900, 12300), maximum=(24099, 12499)): An error occurred (429) when calling the GetObject operation: Too Many Requests
      INFO Creating psf model for interpolation from fwhm(pixels) = 3.0 [default]
      INFO fallbackValueType MEDIAN has been set to 0.0000
      INFO Interpolated over 120 NO_DATA pixels.
      INFO Execution of task 'assembleCoadd' on quantum \{band: 'i', skymap: 'DC2', tract: 2898, patch: 26} took 10128.253 seconds
      

      This "Too Many Requests" error is a transient problem caused by processing exceeding the rate limit on google cloud storage. These errors can be recovered by panda's retry mechanism, but since the quantum reported success then no retries were made. assembleCoadd should raise an exception rather than continuing in this case.

        Attachments

          Issue Links

            Activity

            Hide
            ctslater Colin Slater added a comment -

            The only days where I see an error message "Cannot compute coadd" in the logs are February 12, 13, and 14, so we may have gotten lucky on the runs after that. I will keep checking.

            Show
            ctslater Colin Slater added a comment - The only days where I see an error message "Cannot compute coadd" in the logs are February 12, 13, and 14, so we may have gotten lucky on the runs after that. I will keep checking.
            Hide
            yusra Yusra AlSayyad added a comment -

            Eli Rykoff I’m offering to you first, because I touched your call to assembleOnlineMeanCoadd.
            Turns out self.log.fatal wasn’t doing what we thought it was doing, (its not actual fatal) so I removed it: https://github.com/lsst/pipe_tasks/pull/636/files

            https://ci.lsst.codes/job/stack-os-matrix/35961/display/redirect

            Show
            yusra Yusra AlSayyad added a comment - Eli Rykoff I’m offering to you first, because I touched your call to assembleOnlineMeanCoadd. Turns out self.log.fatal wasn’t doing what we thought it was doing, (its not actual fatal) so I removed it: https://github.com/lsst/pipe_tasks/pull/636/files https://ci.lsst.codes/job/stack-os-matrix/35961/display/redirect

              People

              Assignee:
              yusra Yusra AlSayyad
              Reporter:
              ctslater Colin Slater
              Reviewers:
              Eli Rykoff
              Watchers:
              Colin Slater, Eli Rykoff, Huan Lin, Jen Adelman-Mccarthy, Lauren MacArthur, Meredith Rawls, Sophie Reed, Tim Jenness, Yusra AlSayyad
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.