Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-27180

NaNs in measurePhotonTransferCurve.py causing failures

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: cp_pipe
    • Labels:
      None

      Description

      When running the PTC analysis on BOT run 12606, many amps failed to return valid PTC curves.  I traced this to saturated images in the flat pairs, which caused NaNs in the mean/variance values.  When the code runs the _getInitialGoodPoints routine, the medianRatio parameter becomes NaN, and then all points fail.  I was able to fix this by changing the medianRatio from np.median to np.nanmedian, and then the PTC curves ran OK, but then the plotPtc.py routine failed to plot the PTCs.  As a workaround, I just eliminated the saturated flat pairs from the input deck, but long term the code needs to be robust to saturated inputs.  FWIW, I swear that this problem was not there a few weeks ago.

        Attachments

          Issue Links

            Activity

            Hide
            plazas Andrés Alejandro Plazas Malagón added a comment - - edited

            I replaced np.median in _getInitialGoodPoints in ptc.py, and, similarly, switched to np.nanmin and np.nanmax to calculate limits in the plotting routine (which is what was causing it to fail in this case).

            With this, we still keep the NaNs in the raw vectors. It wasn't happening before because the raw vectors were being filled after the NaNs were discarded.

            Commands: (w_2020_41)

            measurePhotonTransferCurve.py /project/shared/BOT/rerun/cslage/PTC_LSSTCAM_New_12606 --rerun plazas/PTC_LSSTCAM_New_12606/2020OCT14 --id detector=36 expId=3020100800155^3020100800156^3020100800158^3020100800159^3020100800185^3020100800186^3020100800161^3020100800162^3020100800188^3020100800189^3020100800164^3020100800165^3020100800191^3020100800192^3020100800167^3020100800168^3020100800194^3020100800195^3020100800170^3020100800171^3020100800197^3020100800198^3020100800173^3020100800174^3020100800200^3020100800201^3020100800176^3020100800177^3020100800203^3020100800204^3020100800179^3020100800180^3020100800206^3020100800207^3020100800182^3020100800183^3020100800209^3020100800210^3020100800212^3020100800213^3020100800215^3020100800216^3020100800218^3020100800219^3020100800221^3020100800222 -c maxMeanSignal=100000 ptcFitType=EXPAPPROXIMATION doPhotodiode=False sigmaCutPtcOutliers=5.0 initialNonLinearityExclusionThresholdPositive=0.25  --clobber-config --clobber-version -j 1
            

            plotPhotonTransferCurve.py /project/shared/BOT/rerun/cslage/PTC_LSSTCAM_New_12606 --rerun /project/shared/BOT/rerun/cslage/PTC_LSSTCAM_New_12606/rerun/plazas/PTC_LSSTCAM_New_12606/2020OCT14  --id detector=36 -c datasetFileName=/project/shared/BOT/rerun/cslage/PTC_LSSTCAM_New_12606/rerun/plazas/PTC_LSSTCAM_New_12606/2020OCT14/calibrations/ptc/ptcDataset-det036.fits --clobber-versions --clobber-config -j 1
            

            Plots: PTC_det36.pdf

            Show
            plazas Andrés Alejandro Plazas Malagón added a comment - - edited I replaced np.median in _getInitialGoodPoints in ptc.py , and, similarly, switched to np.nanmin and np.nanmax to calculate limits in the plotting routine (which is what was causing it to fail in this case). With this, we still keep the NaNs in the raw vectors. It wasn't happening before because the raw vectors were being filled after the NaNs were discarded. Commands: ( w_2020_41 ) measurePhotonTransferCurve.py /project/shared/BOT/rerun/cslage/PTC_LSSTCAM_New_12606 --rerun plazas/PTC_LSSTCAM_New_12606/2020OCT14 --id detector=36 expId=3020100800155^3020100800156^3020100800158^3020100800159^3020100800185^3020100800186^3020100800161^3020100800162^3020100800188^3020100800189^3020100800164^3020100800165^3020100800191^3020100800192^3020100800167^3020100800168^3020100800194^3020100800195^3020100800170^3020100800171^3020100800197^3020100800198^3020100800173^3020100800174^3020100800200^3020100800201^3020100800176^3020100800177^3020100800203^3020100800204^3020100800179^3020100800180^3020100800206^3020100800207^3020100800182^3020100800183^3020100800209^3020100800210^3020100800212^3020100800213^3020100800215^3020100800216^3020100800218^3020100800219^3020100800221^3020100800222 -c maxMeanSignal=100000 ptcFitType=EXPAPPROXIMATION doPhotodiode=False sigmaCutPtcOutliers=5.0 initialNonLinearityExclusionThresholdPositive=0.25 --clobber-config --clobber-version -j 1 plotPhotonTransferCurve.py /project/shared/BOT/rerun/cslage/PTC_LSSTCAM_New_12606 --rerun /project/shared/BOT/rerun/cslage/PTC_LSSTCAM_New_12606/rerun/plazas/PTC_LSSTCAM_New_12606/2020OCT14 --id detector=36 -c datasetFileName=/project/shared/BOT/rerun/cslage/PTC_LSSTCAM_New_12606/rerun/plazas/PTC_LSSTCAM_New_12606/2020OCT14/calibrations/ptc/ptcDataset-det036.fits --clobber-versions --clobber-config -j 1 Plots: PTC_det36.pdf
            Hide
            mfisherlevine Merlin Fisher-Levine added a comment -

            Small comments on the docs, but great other than that.

            Show
            mfisherlevine Merlin Fisher-Levine added a comment - Small comments on the docs, but great other than that.
            Hide
            cslage Craig Lage added a comment -

            I really don't understand where these NaNs are coming from.  Eliminating the saturated images removed most of the issue, but there are still afew amps that are returing NaN for no apparent reason.  I whittled it down to this simple command line, which runs very fast:

            measurePhotonTransferCurve.py /project/shared/BOT/rerun/cslage/PTC_LSSTCAM_New_12606 --rerun /project/shared/BOT/rerun/cslage/PTC_LSSTCAM_New_12606 --id detector=180 expId=3020100800155^3020100800156    -c maxMeanSignal=100000 ptcFitType=EXPAPPROXIMATION initialNonLinearityExclusionThresholdPositive=0.25 doPhotodiode=False --clobber-versions -j 

            Then I added these print statements in measureMeanVarCov in ptc.py:

                    mu1 = afwMath.makeStatistics(im1Area, afwMath.MEANCLIP, im1StatsCtrl).getValue()
                    mu2 = afwMath.makeStatistics(im2Area, afwMath.MEANCLIP, im2StatsCtrl).getValue()
                    print("In measureMeanVarCov, amp = %s, expId = %s"%(ampName, exposure1.getInfo().getVisitInfo().getExposureId()))
                    print("im1Area.image.array min and max:", im1Area.image.array.min(), im1Area.image.array.max())
                    print("im1Area mean (mu1) as calculated by afwMath.makeStatistics",mu1)
                    print()
            

            When I run this, amps C02 and C07 return NaN for the mean, even though the images look fine and I can print out the min and max of the array data.  Other amps look OK. Something with the mask???

            In measureMeanVarCov, amp = C10, expId = 3020100800155180
            im1Area.image.array min and max: 36.060616 171.45981
            im1Area mean (mu1) as calculated by afwMath.makeStatistics 118.79587169182115
             
            In measureMeanVarCov, amp = C11, expId = 3020100800155180
            im1Area.image.array min and max: 35.585648 186.14578
            im1Area mean (mu1) as calculated by afwMath.makeStatistics 115.47448062124805
             
            In measureMeanVarCov, amp = C12, expId = 3020100800155180
            im1Area.image.array min and max: 35.35383 1805.6526
            im1Area mean (mu1) as calculated by afwMath.makeStatistics 116.74028567317016
             
            In measureMeanVarCov, amp = C13, expId = 3020100800155180
            im1Area.image.array min and max: 37.272396 202.33514
            im1Area mean (mu1) as calculated by afwMath.makeStatistics 115.1495522146488
             
            In measureMeanVarCov, amp = C14, expId = 3020100800155180
            im1Area.image.array min and max: 35.201366 168.98125
            im1Area mean (mu1) as calculated by afwMath.makeStatistics 114.183253251471
             
            In measureMeanVarCov, amp = C15, expId = 3020100800155180
            im1Area.image.array min and max: -12621.5 26990.564
            im1Area mean (mu1) as calculated by afwMath.makeStatistics 114.66133031705989
             
            In measureMeanVarCov, amp = C16, expId = 3020100800155180
            im1Area.image.array min and max: 37.31251 169.0054
            im1Area mean (mu1) as calculated by afwMath.makeStatistics 113.54425356970359
             
            In measureMeanVarCov, amp = C17, expId = 3020100800155180
            im1Area.image.array min and max: 35.534927 179.67491
            im1Area mean (mu1) as calculated by afwMath.makeStatistics 114.70147974456593
             
            In measureMeanVarCov, amp = C07, expId = 3020100800155180
            im1Area.image.array min and max: 107.59257 123.71165
            im1Area mean (mu1) as calculated by afwMath.makeStatistics nan
             
            measurePhotonTransferCurve WARN: NaN mean or var, or None cov in amp C07 in exposure pair 3020100800155180, 3020100800156180 of detector 180.
             
            In measureMeanVarCov, amp = C06, expId = 3020100800155180
            im1Area.image.array min and max: 42.44537 173.70656
            im1Area mean (mu1) as calculated by afwMath.makeStatistics 120.40967085989031
             
            In measureMeanVarCov, amp = C05, expId = 3020100800155180
            im1Area.image.array min and max: -1976.951 181.91766
            im1Area mean (mu1) as calculated by afwMath.makeStatistics 124.48124170966868
             
            In measureMeanVarCov, amp = C04, expId = 3020100800155180
            im1Area.image.array min and max: 43.84583 200.44437
            im1Area mean (mu1) as calculated by afwMath.makeStatistics 124.45436966906746
             
            In measureMeanVarCov, amp = C03, expId = 3020100800155180
            im1Area.image.array min and max: 39.0246 222.63539
            im1Area mean (mu1) as calculated by afwMath.makeStatistics 123.44538765460254
             
            In measureMeanVarCov, amp = C02, expId = 3020100800155180
            im1Area.image.array min and max: 75.340385 146.52866
            im1Area mean (mu1) as calculated by afwMath.makeStatistics nan
             
            measurePhotonTransferCurve WARN: NaN mean or var, or None cov in amp C02 in exposure pair 3020100800155180, 3020100800156180 of detector 180.
             
            In measureMeanVarCov, amp = C01, expId = 3020100800155180
            im1Area.image.array min and max: 5.981539 185.96135
            im1Area mean (mu1) as calculated by afwMath.makeStatistics 126.83769476779912
             
            In measureMeanVarCov, amp = C00, expId = 3020100800155180
            im1Area.image.array min and max: 47.930855 185.4127
            im1Area mean (mu1) as calculated by afwMath.makeStatistics 127.40075437510123
            
            

            Show
            cslage Craig Lage added a comment - I really don't understand where these NaNs are coming from.  Eliminating the saturated images removed most of the issue, but there are still afew amps that are returing NaN for no apparent reason.  I whittled it down to this simple command line, which runs very fast: measurePhotonTransferCurve.py /project/shared/BOT/rerun/cslage/PTC_LSSTCAM_New_12606 --rerun /project/shared/BOT/rerun/cslage/PTC_LSSTCAM_New_12606 --id detector= 180 expId= 3020100800155 ^ 3020100800156 -c maxMeanSignal= 100000 ptcFitType=EXPAPPROXIMATION initialNonLinearityExclusionThresholdPositive= 0.25 doPhotodiode=False --clobber-versions -j Then I added these print statements in measureMeanVarCov in ptc.py: mu1 = afwMath.makeStatistics(im1Area, afwMath.MEANCLIP, im1StatsCtrl).getValue() mu2 = afwMath.makeStatistics(im2Area, afwMath.MEANCLIP, im2StatsCtrl).getValue() print( "In measureMeanVarCov, amp = %s, expId = %s" %(ampName, exposure1.getInfo().getVisitInfo().getExposureId())) print( "im1Area.image.array min and max:" , im1Area.image.array.min(), im1Area.image.array.max()) print( "im1Area mean (mu1) as calculated by afwMath.makeStatistics" ,mu1) print() When I run this, amps C02 and C07 return NaN for the mean, even though the images look fine and I can print out the min and max of the array data.  Other amps look OK. Something with the mask??? In measureMeanVarCov, amp = C10, expId = 3020100800155180 im1Area.image.array min and max: 36.060616 171.45981 im1Area mean (mu1) as calculated by afwMath.makeStatistics 118.79587169182115   In measureMeanVarCov, amp = C11, expId = 3020100800155180 im1Area.image.array min and max: 35.585648 186.14578 im1Area mean (mu1) as calculated by afwMath.makeStatistics 115.47448062124805   In measureMeanVarCov, amp = C12, expId = 3020100800155180 im1Area.image.array min and max: 35.35383 1805.6526 im1Area mean (mu1) as calculated by afwMath.makeStatistics 116.74028567317016   In measureMeanVarCov, amp = C13, expId = 3020100800155180 im1Area.image.array min and max: 37.272396 202.33514 im1Area mean (mu1) as calculated by afwMath.makeStatistics 115.1495522146488   In measureMeanVarCov, amp = C14, expId = 3020100800155180 im1Area.image.array min and max: 35.201366 168.98125 im1Area mean (mu1) as calculated by afwMath.makeStatistics 114.183253251471   In measureMeanVarCov, amp = C15, expId = 3020100800155180 im1Area.image.array min and max: - 12621.5 26990.564 im1Area mean (mu1) as calculated by afwMath.makeStatistics 114.66133031705989   In measureMeanVarCov, amp = C16, expId = 3020100800155180 im1Area.image.array min and max: 37.31251 169.0054 im1Area mean (mu1) as calculated by afwMath.makeStatistics 113.54425356970359   In measureMeanVarCov, amp = C17, expId = 3020100800155180 im1Area.image.array min and max: 35.534927 179.67491 im1Area mean (mu1) as calculated by afwMath.makeStatistics 114.70147974456593   In measureMeanVarCov, amp = C07, expId = 3020100800155180 im1Area.image.array min and max: 107.59257 123.71165 im1Area mean (mu1) as calculated by afwMath.makeStatistics nan   measurePhotonTransferCurve WARN: NaN mean or var, or None cov in amp C07 in exposure pair 3020100800155180 , 3020100800156180 of detector 180 .   In measureMeanVarCov, amp = C06, expId = 3020100800155180 im1Area.image.array min and max: 42.44537 173.70656 im1Area mean (mu1) as calculated by afwMath.makeStatistics 120.40967085989031   In measureMeanVarCov, amp = C05, expId = 3020100800155180 im1Area.image.array min and max: - 1976.951 181.91766 im1Area mean (mu1) as calculated by afwMath.makeStatistics 124.48124170966868   In measureMeanVarCov, amp = C04, expId = 3020100800155180 im1Area.image.array min and max: 43.84583 200.44437 im1Area mean (mu1) as calculated by afwMath.makeStatistics 124.45436966906746   In measureMeanVarCov, amp = C03, expId = 3020100800155180 im1Area.image.array min and max: 39.0246 222.63539 im1Area mean (mu1) as calculated by afwMath.makeStatistics 123.44538765460254   In measureMeanVarCov, amp = C02, expId = 3020100800155180 im1Area.image.array min and max: 75.340385 146.52866 im1Area mean (mu1) as calculated by afwMath.makeStatistics nan   measurePhotonTransferCurve WARN: NaN mean or var, or None cov in amp C02 in exposure pair 3020100800155180 , 3020100800156180 of detector 180 .   In measureMeanVarCov, amp = C01, expId = 3020100800155180 im1Area.image.array min and max: 5.981539 185.96135 im1Area mean (mu1) as calculated by afwMath.makeStatistics 126.83769476779912   In measureMeanVarCov, amp = C00, expId = 3020100800155180 im1Area.image.array min and max: 47.930855 185.4127 im1Area mean (mu1) as calculated by afwMath.makeStatistics 127.40075437510123
            Hide
            cslage Craig Lage added a comment -

            Andrés and I think that the reason that these amps are returning NaN is that the defect code has decided to mask out the entire amp for some reason.  Below are the first 15 pixels of row 100 in the mask plane.  All of the amps have a value of 128 for the first 10 pixels - this is the edge masking.  But Amps C02 and C07 have a non-zero value in the interior.  Now the question is why they were masked out.  The images look reasonable, and the EOTest code retruned valid gain values.

            C10 [128 128 128 128 128 128 128 128 128 128   0   0   0   0   0]
            C11 [128 128 128 128 128 128 128 128 128 128   0   0   0   0   0]
            C12 [128 128 128 128 128 128 128 128 128 128   0   0   0   0   0]
            C13 [128 128 128 128 128 128 128 128 128 128   0   0   0   0   0]
            C14 [128 128 128 128 128 128 128 128 128 128   0   0   0   0   0]
            C15 [128 128 128 128 128 128 128 128 128 128   0   0   0   0   0]
            C16 [128 128 128 128 128 128 128 128 128 128   0   0   0   0   0]
            C17 [128 128 128 128 128 128 128 128 128 128   0   0   0   0   0]
            C07 [391 391 391 391 391 391 391 391 391 391 263 263 263 263 263]
            C06 [128 128 128 128 128 128 128 128 128 128   0   0   0   0   0]
            C05 [128 128 128 128 128 128 128 128 128 128   0   0   0   0   0]
            C04 [128 128 128 128 128 128 128 128 128 128   0   0   0   0   0]
            C03 [390 390 390 390 390 128 128 128 128 128   0   0   0   0   0]
            C02 [391 391 391 391 391 391 391 391 391 391 263 263 263 263 263]
            C01 [128 128 128 128 128 128 128 128 128 128   0   0   0   0   0]
            C00 [128 128 128 128 128 128 128 128 128 128   0   0   0   0   0]
            

            Show
            cslage Craig Lage added a comment - Andrés and I think that the reason that these amps are returning NaN is that the defect code has decided to mask out the entire amp for some reason.  Below are the first 15 pixels of row 100 in the mask plane.  All of the amps have a value of 128 for the first 10 pixels - this is the edge masking.  But Amps C02 and C07 have a non-zero value in the interior.  Now the question is why they were masked out.  The images look reasonable, and the EOTest code retruned valid gain values. C10 [ 128 128 128 128 128 128 128 128 128 128 0 0 0 0 0 ] C11 [ 128 128 128 128 128 128 128 128 128 128 0 0 0 0 0 ] C12 [ 128 128 128 128 128 128 128 128 128 128 0 0 0 0 0 ] C13 [ 128 128 128 128 128 128 128 128 128 128 0 0 0 0 0 ] C14 [ 128 128 128 128 128 128 128 128 128 128 0 0 0 0 0 ] C15 [ 128 128 128 128 128 128 128 128 128 128 0 0 0 0 0 ] C16 [ 128 128 128 128 128 128 128 128 128 128 0 0 0 0 0 ] C17 [ 128 128 128 128 128 128 128 128 128 128 0 0 0 0 0 ] C07 [ 391 391 391 391 391 391 391 391 391 391 263 263 263 263 263 ] C06 [ 128 128 128 128 128 128 128 128 128 128 0 0 0 0 0 ] C05 [ 128 128 128 128 128 128 128 128 128 128 0 0 0 0 0 ] C04 [ 128 128 128 128 128 128 128 128 128 128 0 0 0 0 0 ] C03 [ 390 390 390 390 390 128 128 128 128 128 0 0 0 0 0 ] C02 [ 391 391 391 391 391 391 391 391 391 391 263 263 263 263 263 ] C01 [ 128 128 128 128 128 128 128 128 128 128 0 0 0 0 0 ] C00 [ 128 128 128 128 128 128 128 128 128 128 0 0 0 0 0 ]
            Hide
            plazas Andrés Alejandro Plazas Malagón added a comment -

            With the help of Chris, we traced the problem to the fact that amps C02 and C07 have negative saturation levels: https://github.com/lsst/obs_lsst/blob/master/policy/lsstCam/R43.yaml#L64 and that's why they were being masked/declared as bad.

            For the moment, isr.doSaturation=False of setting isr.saturation to some high level during ISR would help, but we are consulting (#dm-lsstcam) to see what the proper fix is.

            Show
            plazas Andrés Alejandro Plazas Malagón added a comment - With the help of Chris, we traced the problem to the fact that amps C02 and C07 have negative saturation levels: https://github.com/lsst/obs_lsst/blob/master/policy/lsstCam/R43.yaml#L64 and that's why they were being masked/declared as bad. For the moment, isr.doSaturation=False of setting isr.saturation to some high level during ISR would help, but we are consulting (#dm-lsstcam) to see what the proper fix is.
            Hide
            plazas Andrés Alejandro Plazas Malagón added a comment -

            I'll set the negative values to zero for now:

            grep "saturation : -" *.yaml
            R43.yaml:      C07 : { gain : 1.348025, readNoise : 6.527312, saturation : -4376471.000000 }
            R43.yaml:      C02 : { gain : 1.367428, readNoise : 6.830328, saturation : -5319002.500000 }
            R43.yaml:      C06 : { gain : 1.396356, readNoise : 6.672390, saturation : -122039.617188 }
            

            Discussion in Slack about the topic: https://lsstc.slack.com/archives/CBE964PR8/p1603131198011000?thread_ts=1602885539.009400&cid=CBE964PR8

            Show
            plazas Andrés Alejandro Plazas Malagón added a comment - I'll set the negative values to zero for now: grep "saturation : -" *.yaml R43.yaml: C07 : { gain : 1.348025, readNoise : 6.527312, saturation : -4376471.000000 } R43.yaml: C02 : { gain : 1.367428, readNoise : 6.830328, saturation : -5319002.500000 } R43.yaml: C06 : { gain : 1.396356, readNoise : 6.672390, saturation : -122039.617188 } Discussion in Slack about the topic: https://lsstc.slack.com/archives/CBE964PR8/p1603131198011000?thread_ts=1602885539.009400&cid=CBE964PR8
            Show
            plazas Andrés Alejandro Plazas Malagón added a comment - https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/32879/pipeline

              People

              Assignee:
              plazas Andrés Alejandro Plazas Malagón
              Reporter:
              cslage Craig Lage
              Reviewers:
              Merlin Fisher-Levine
              Watchers:
              Andrés Alejandro Plazas Malagón, Christopher Waters, Craig Lage, Merlin Fisher-Levine
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.