Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-31359

MaskStreaks sending bad matrix to scipy.linalg.cho_factor

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Two patches failed assembleCoadd in w_2020_30: /scratch/brendal4/bps-gen3-rc2/submit/HSC/runs/RC2/w_2021_30/DM-31182/20210805T183728Z/jobs/assembleCoadd

      See logs:
      20513_assembleCoadd_9697_0_i_.3540161.err
      20084_assembleCoadd_9697_9_i_.3540164.err

      How to reproduce:

      pipetask run -b /repo/main/ -i HSC/runs/RC2/w_2021_30/DM-31182 -o u/yourUserName/HSC/debug_w30 -p $OBS_SUBARU_DIR/pipelines/DRP.yaml#assembleCoadd -d "tract=9697 AND patch=9 AND skymap='hsc_rings_v1' AND band='i'" --register-dataset-types

        Attachments

          Issue Links

            Activity

            Hide
            yusra Yusra AlSayyad added a comment -

            We are giving up on these patches for w_2020_30, so please look into before the w_2020_34 stack gets tagged August 18

            Show
            yusra Yusra AlSayyad added a comment - We are giving up on these patches for w_2020_30, so please look into before the w_2020_34 stack gets tagged August 18
            Hide
            csaunder Clare Saunders added a comment -

            Okay, will do.

            Show
            csaunder Clare Saunders added a comment - Okay, will do.
            Hide
            lauren Lauren MacArthur added a comment -

            Curiously, this seems to be happening for a different patch on the Gen2 w_2021_30 run (DM-31184)
            9813 HSC-I patch:"2,7" (seqPatchId: 65):

             lauren@lsst-condorprod-sub01:~$ grep "Caught LinAlgError" /home/mschmitz/HSC/RC2/rerun_scripts/DM-31184/logs/coadd9*
            /home/mschmitz/HSC/RC2/rerun_scripts/DM-31184/logs/coadd9813HSC-I.o54448:28383 WARN  2021-07-29T12:32:32.232-0500 coaddDriver: lsst-verify-worker41:28383: Caught LinAlgError while coadding DataId(initialdata={'tract': 9813, 'filter': 'HSC-I', 'patch': '2,7'}, tag=set()): 3-th leading minor of the array is not positive definite
            

            whereas there seems to be data for all patches in the 9697 & 9615 tracts.
             

            Show
            lauren Lauren MacArthur added a comment - Curiously, this seems to be happening for a different patch on the Gen2 w_2021_30 run ( DM-31184 ) 9813 HSC-I patch:"2,7" (seqPatchId: 65): lauren@lsst - condorprod - sub01:~$ grep "Caught LinAlgError" / home / mschmitz / HSC / RC2 / rerun_scripts / DM - 31184 / logs / coadd9 * / home / mschmitz / HSC / RC2 / rerun_scripts / DM - 31184 / logs / coadd9813HSC - I.o54448: 28383 WARN 2021 - 07 - 29T12 : 32 : 32.232 - 0500 coaddDriver: lsst - verify - worker41: 28383 : Caught LinAlgError while coadding DataId(initialdata = { 'tract' : 9813 , 'filter' : 'HSC-I' , 'patch' : '2,7' }, tag = set ()): 3 - th leading minor of the array is not positive definite whereas there seems to be data for all patches in the 9697 & 9615 tracts.  
            Hide
            csaunder Clare Saunders added a comment - - edited

            Lauren MacArthur, for better or worse the Gen2 failure is actually from a different problem. This alerted me to a line in maskStreaks that should have be abs(x) < y, not x < y .  

            Show
            csaunder Clare Saunders added a comment - - edited Lauren MacArthur , for better or worse the Gen2 failure is actually from a different problem. This alerted me to a line in maskStreaks that should have be abs(x) < y , not x < y  .  
            Hide
            lauren Lauren MacArthur added a comment -

            Awesome that it caught an otherwise hidden bug!  I'm still puzzled why the "other" patches didn't fail on the gen2 run...

            Show
            lauren Lauren MacArthur added a comment - Awesome that it caught an otherwise hidden bug!  I'm still puzzled why the "other" patches didn't fail on the gen2 run...
            Hide
            csaunder Clare Saunders added a comment - - edited

            The original Gen3 failures reported by Yusra AlSayyad were caused by negative variance in the input images. This should be solved by DM-31394.

            Show
            csaunder Clare Saunders added a comment - - edited The original Gen3 failures reported by Yusra AlSayyad  were caused by negative variance in the input images. This should be solved by DM-31394 .

              People

              Assignee:
              csaunder Clare Saunders
              Reporter:
              yusra Yusra AlSayyad
              Reviewers:
              Christopher Waters
              Watchers:
              Christopher Waters, Clare Saunders, Lauren MacArthur, Yusra AlSayyad
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.