Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-24762

Add option to PTC task to correct for sigma clipping bias

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: cp_pipe
    • Labels:
      None

      Description

      When sigma-clipping to calculate the mean and variances, biases are potentially introduced at both the high and low flux ends, and should be corrected for somehow.

      It's not clear how to do this well, but this ticket notes at least considering how we should approach it.

        Attachments

          Activity

          Hide
          plazas Andrés Alejandro Plazas Malagón added a comment -

          Comparing the two formulas, there are differences > 1% when sigma < 2.5:

          print ("Sigma | Hirata&Choi | Waters | Fractional Residual (%)")
          for sigma in nSigmaClip:
              v1 = 1./np.sqrt(f(sigma))
              v2 = sigmaClipCorrection(sigma)
              print (sigma, 1./np.sqrt(f(sigma) ), sigmaClipCorrection(sigma), 100*(v1/v2 - 1)) 
          

          Sigma | Hirata&Choi | Waters | Fractional Residual (%)
          2.0 1.2924216342232249 1.2262586964500768 5.395512216523701
          2.5 1.0973860297375746 1.0887436390646081 0.7937948257857652
          3.0 1.027393469477902 1.0266630753374586 0.07114253526683267
          3.5 1.0061492039598166 1.0061116223474766 0.0037353322936750644
          4.0 1.0010718572754302 1.0010707096275262 0.00011464204205946515
          4.5 1.0001438753446412 1.0001438546475043 2.0694159941214707e-06
          5.0 1.0000148674247078 1.0000148672036708 2.2103363583880764e-08
          5.5 1.0000011846750532 1.0000011846736496 1.403543947731123e-10
          6.0 1.0000000729105996 1.0000000729105945 5.10702591327572e-13
          6.5 1.0000000034704237 1.0000000034704235 2.220446049250313e-14
          7.0 1.0000000001278861 1.0000000001278861 0.0
          7.5 1.0000000000036515 1.0000000000036515 0.0
          8.0 1.0000000000000808 1.0000000000000808 0.0
          8.5 1.0000000000000016 1.0000000000000013 2.220446049250313e-14
          9.0 1.0 1.0 0.0
          9.5 1.0 1.0 0.0
          10.0 1.0 1.0 0.0
          10.5 1.0 1.0 0.0
          

          Hirata&Choi2020 Hirata and Choi ignore second-order terms and higher in the correlation coefficient (see formula above).

          # Hirata and Choi formula 
          def f(nSigClip):
              epsilon = 1.0 - norm.cdf(nSigClip)
              return (1 - np.sqrt(2/np.pi)*((nSigClip*np.exp(-nSigClip**2/2))/(1 - 2*epsilon)))**2
          

          # Chris Waters's formula
          def sigmaClipCorrection(nSigClip):
              varFactor = 1.0 + (2 * nSigClip * norm.pdf(nSigClip)) / (norm.cdf(nSigClip) - norm.cdf(-nSigClip))
              return varFactor #1.0 / np.sqrt(varFactor)
          

          Show
          plazas Andrés Alejandro Plazas Malagón added a comment - Comparing the two formulas, there are differences > 1% when sigma < 2.5: print ( "Sigma | Hirata&Choi | Waters | Fractional Residual (%)" ) for sigma in nSigmaClip: v1 = 1. / np.sqrt(f(sigma)) v2 = sigmaClipCorrection(sigma) print (sigma, 1. / np.sqrt(f(sigma) ), sigmaClipCorrection(sigma), 100 * (v1 / v2 - 1 )) Sigma | Hirata&Choi | Waters | Fractional Residual (%) 2.0 1.2924216342232249 1.2262586964500768 5.395512216523701 2.5 1.0973860297375746 1.0887436390646081 0.7937948257857652 3.0 1.027393469477902 1.0266630753374586 0.07114253526683267 3.5 1.0061492039598166 1.0061116223474766 0.0037353322936750644 4.0 1.0010718572754302 1.0010707096275262 0.00011464204205946515 4.5 1.0001438753446412 1.0001438546475043 2.0694159941214707e-06 5.0 1.0000148674247078 1.0000148672036708 2.2103363583880764e-08 5.5 1.0000011846750532 1.0000011846736496 1.403543947731123e-10 6.0 1.0000000729105996 1.0000000729105945 5.10702591327572e-13 6.5 1.0000000034704237 1.0000000034704235 2.220446049250313e-14 7.0 1.0000000001278861 1.0000000001278861 0.0 7.5 1.0000000000036515 1.0000000000036515 0.0 8.0 1.0000000000000808 1.0000000000000808 0.0 8.5 1.0000000000000016 1.0000000000000013 2.220446049250313e-14 9.0 1.0 1.0 0.0 9.5 1.0 1.0 0.0 10.0 1.0 1.0 0.0 10.5 1.0 1.0 0.0 Hirata&Choi2020 Hirata and Choi ignore second-order terms and higher in the correlation coefficient (see formula above). # Hirata and Choi formula def f(nSigClip): epsilon = 1.0 - norm.cdf(nSigClip) return ( 1 - np.sqrt( 2 / np.pi) * ((nSigClip * np.exp( - nSigClip * * 2 / 2 )) / ( 1 - 2 * epsilon))) * * 2 # Chris Waters's formula def sigmaClipCorrection(nSigClip): varFactor = 1.0 + ( 2 * nSigClip * norm.pdf(nSigClip)) / (norm.cdf(nSigClip) - norm.cdf( - nSigClip)) return varFactor #1.0 / np.sqrt(varFactor)
          Hide
          plazas Andrés Alejandro Plazas Malagón added a comment - - edited

          Chris Waters points out that the difference at low sigma appears to be betwee his use of f ~ 1/ (1 + x) and H&C20's use of f ~ 1 - x, for small x.

          We'll stick with the existing code that Chris has in the crosstalk algorithm (moving it to utils and then import it from there).

          Show
          plazas Andrés Alejandro Plazas Malagón added a comment - - edited Chris Waters points out that the difference at low sigma appears to be betwee his use of f ~ 1/ (1 + x) and H&C20's use of f ~ 1 - x , for small x . We'll stick with the existing code that Chris has in the crosstalk algorithm (moving it to utils and then import it from there).
          Hide
          plazas Andrés Alejandro Plazas Malagón added a comment - - edited

          Chris's formula had a sign typo. Using:

          # Chris Waters's formula
          def sigmaClipCorrection(nSigClip):
              varFactor = 1.0 - (2 * nSigClip * norm.pdf(nSigClip)) / (norm.cdf(nSigClip) - norm.cdf(-nSigClip))
              return varFactor #1.0 / np.sqrt(varFactor)
          

          the two formulas agree (varFactor above is equivalent to 1/sqrt(f), where f is the formula in Hirata&Choi2020):

          Sigma | Hirata&Choi | Waters | Fractional Residual (%)
          2.0 1.2924216342232249 1.2924216342232249 0.0
          2.5 1.0973860297375746 1.0973860297375746 0.0
          3.0 1.027393469477902 1.027393469477902 0.0
          3.5 1.0061492039598166 1.0061492039598166 0.0
          4.0 1.0010718572754302 1.0010718572754302 0.0
          4.5 1.0001438753446412 1.0001438753446412 0.0
          5.0 1.0000148674247078 1.0000148674247078 0.0
          5.5 1.0000011846750532 1.0000011846750532 0.0
          6.0 1.0000000729105996 1.0000000729105996 0.0
          6.5 1.0000000034704237 1.0000000034704237 0.0
          7.0 1.0000000001278861 1.0000000001278861 0.0
          7.5 1.0000000000036515 1.0000000000036515 0.0
          8.0 1.0000000000000808 1.0000000000000808 0.0
          8.5 1.0000000000000016 1.0000000000000016 0.0
          9.0 1.0 1.0 0.0
          9.5 1.0 1.0 0.0
          10.0 1.0 1.0 0.0
          10.5 1.0 1.0 0.0
          

          Show
          plazas Andrés Alejandro Plazas Malagón added a comment - - edited Chris's formula had a sign typo. Using: # Chris Waters's formula def sigmaClipCorrection(nSigClip): varFactor = 1.0 - ( 2 * nSigClip * norm.pdf(nSigClip)) / (norm.cdf(nSigClip) - norm.cdf( - nSigClip)) return varFactor #1.0 / np.sqrt(varFactor) the two formulas agree ( varFactor above is equivalent to 1/sqrt(f) , where f is the formula in Hirata&Choi2020): Sigma | Hirata&Choi | Waters | Fractional Residual (%) 2.0 1.2924216342232249 1.2924216342232249 0.0 2.5 1.0973860297375746 1.0973860297375746 0.0 3.0 1.027393469477902 1.027393469477902 0.0 3.5 1.0061492039598166 1.0061492039598166 0.0 4.0 1.0010718572754302 1.0010718572754302 0.0 4.5 1.0001438753446412 1.0001438753446412 0.0 5.0 1.0000148674247078 1.0000148674247078 0.0 5.5 1.0000011846750532 1.0000011846750532 0.0 6.0 1.0000000729105996 1.0000000729105996 0.0 6.5 1.0000000034704237 1.0000000034704237 0.0 7.0 1.0000000001278861 1.0000000001278861 0.0 7.5 1.0000000000036515 1.0000000000036515 0.0 8.0 1.0000000000000808 1.0000000000000808 0.0 8.5 1.0000000000000016 1.0000000000000016 0.0 9.0 1.0 1.0 0.0 9.5 1.0 1.0 0.0 10.0 1.0 1.0 0.0 10.5 1.0 1.0 0.0
          Hide
          mfisherlevine Merlin Fisher-Levine added a comment -

          One minor comment, just double-check the squaring and you're good to merge

          Show
          mfisherlevine Merlin Fisher-Levine added a comment - One minor comment, just double-check the squaring and you're good to merge
          Show
          plazas Andrés Alejandro Plazas Malagón added a comment - https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/33961/pipeline

            People

            Assignee:
            plazas Andrés Alejandro Plazas Malagón
            Reporter:
            mfisherlevine Merlin Fisher-Levine
            Reviewers:
            Merlin Fisher-Levine
            Watchers:
            Andrés Alejandro Plazas Malagón, Merlin Fisher-Levine, Nicholas Mondrik, Seth Digel
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: