 # Add option to PTC task to correct for sigma clipping bias

XMLWordPrintable

#### Details

• Type: Story
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
None
• Story Points:
3
• Team:
Data Release Production
• Urgent?:
No

#### Description

When sigma-clipping to calculate the mean and variances, biases are potentially introduced at both the high and low flux ends, and should be corrected for somehow.

It's not clear how to do this well, but this ticket notes at least considering how we should approach it.

#### Activity

Comparing the two formulas, there are differences > 1% when sigma < 2.5:

 print ("Sigma | Hirata&Choi | Waters | Fractional Residual (%)") for sigma in nSigmaClip:  v1 = 1./np.sqrt(f(sigma))  v2 = sigmaClipCorrection(sigma)  print (sigma, 1./np.sqrt(f(sigma) ), sigmaClipCorrection(sigma), 100*(v1/v2 - 1)) 

 Sigma | Hirata&Choi | Waters | Fractional Residual (%) 2.0 1.2924216342232249 1.2262586964500768 5.395512216523701 2.5 1.0973860297375746 1.0887436390646081 0.7937948257857652 3.0 1.027393469477902 1.0266630753374586 0.07114253526683267 3.5 1.0061492039598166 1.0061116223474766 0.0037353322936750644 4.0 1.0010718572754302 1.0010707096275262 0.00011464204205946515 4.5 1.0001438753446412 1.0001438546475043 2.0694159941214707e-06 5.0 1.0000148674247078 1.0000148672036708 2.2103363583880764e-08 5.5 1.0000011846750532 1.0000011846736496 1.403543947731123e-10 6.0 1.0000000729105996 1.0000000729105945 5.10702591327572e-13 6.5 1.0000000034704237 1.0000000034704235 2.220446049250313e-14 7.0 1.0000000001278861 1.0000000001278861 0.0 7.5 1.0000000000036515 1.0000000000036515 0.0 8.0 1.0000000000000808 1.0000000000000808 0.0 8.5 1.0000000000000016 1.0000000000000013 2.220446049250313e-14 9.0 1.0 1.0 0.0 9.5 1.0 1.0 0.0 10.0 1.0 1.0 0.0 10.5 1.0 1.0 0.0 

Hirata&Choi2020 Hirata and Choi ignore second-order terms and higher in the correlation coefficient (see formula above).

 # Hirata and Choi formula  def f(nSigClip):  epsilon = 1.0 - norm.cdf(nSigClip)  return (1 - np.sqrt(2/np.pi)*((nSigClip*np.exp(-nSigClip**2/2))/(1 - 2*epsilon)))**2 

 # Chris Waters's formula def sigmaClipCorrection(nSigClip):  varFactor = 1.0 + (2 * nSigClip * norm.pdf(nSigClip)) / (norm.cdf(nSigClip) - norm.cdf(-nSigClip))  return varFactor #1.0 / np.sqrt(varFactor) 

Show Andrés Alejandro Plazas Malagón added a comment - Comparing the two formulas, there are differences > 1% when sigma < 2.5: print ( "Sigma | Hirata&Choi | Waters | Fractional Residual (%)" ) for sigma in nSigmaClip: v1 = 1. / np.sqrt(f(sigma)) v2 = sigmaClipCorrection(sigma) print (sigma, 1. / np.sqrt(f(sigma) ), sigmaClipCorrection(sigma), 100 * (v1 / v2 - 1 )) Sigma | Hirata&Choi | Waters | Fractional Residual (%) 2.0 1.2924216342232249 1.2262586964500768 5.395512216523701 2.5 1.0973860297375746 1.0887436390646081 0.7937948257857652 3.0 1.027393469477902 1.0266630753374586 0.07114253526683267 3.5 1.0061492039598166 1.0061116223474766 0.0037353322936750644 4.0 1.0010718572754302 1.0010707096275262 0.00011464204205946515 4.5 1.0001438753446412 1.0001438546475043 2.0694159941214707e-06 5.0 1.0000148674247078 1.0000148672036708 2.2103363583880764e-08 5.5 1.0000011846750532 1.0000011846736496 1.403543947731123e-10 6.0 1.0000000729105996 1.0000000729105945 5.10702591327572e-13 6.5 1.0000000034704237 1.0000000034704235 2.220446049250313e-14 7.0 1.0000000001278861 1.0000000001278861 0.0 7.5 1.0000000000036515 1.0000000000036515 0.0 8.0 1.0000000000000808 1.0000000000000808 0.0 8.5 1.0000000000000016 1.0000000000000013 2.220446049250313e-14 9.0 1.0 1.0 0.0 9.5 1.0 1.0 0.0 10.0 1.0 1.0 0.0 10.5 1.0 1.0 0.0 Hirata&Choi2020 Hirata and Choi ignore second-order terms and higher in the correlation coefficient (see formula above). # Hirata and Choi formula def f(nSigClip): epsilon = 1.0 - norm.cdf(nSigClip) return ( 1 - np.sqrt( 2 / np.pi) * ((nSigClip * np.exp( - nSigClip * * 2 / 2 )) / ( 1 - 2 * epsilon))) * * 2 # Chris Waters's formula def sigmaClipCorrection(nSigClip): varFactor = 1.0 + ( 2 * nSigClip * norm.pdf(nSigClip)) / (norm.cdf(nSigClip) - norm.cdf( - nSigClip)) return varFactor #1.0 / np.sqrt(varFactor)

Chris Waters points out that the difference at low sigma appears to be betwee his use of f ~ 1/ (1 + x) and H&C20's use of f ~ 1 - x, for small x.

We'll stick with the existing code that Chris has in the crosstalk algorithm (moving it to utils and then import it from there).

Show Andrés Alejandro Plazas Malagón added a comment - - edited Chris Waters points out that the difference at low sigma appears to be betwee his use of f ~ 1/ (1 + x) and H&C20's use of f ~ 1 - x , for small x . We'll stick with the existing code that Chris has in the crosstalk algorithm (moving it to utils and then import it from there).

Chris's formula had a sign typo. Using:

 # Chris Waters's formula def sigmaClipCorrection(nSigClip):  varFactor = 1.0 - (2 * nSigClip * norm.pdf(nSigClip)) / (norm.cdf(nSigClip) - norm.cdf(-nSigClip))  return varFactor #1.0 / np.sqrt(varFactor) 

the two formulas agree (varFactor above is equivalent to 1/sqrt(f), where f is the formula in Hirata&Choi2020):

 Sigma | Hirata&Choi | Waters | Fractional Residual (%) 2.0 1.2924216342232249 1.2924216342232249 0.0 2.5 1.0973860297375746 1.0973860297375746 0.0 3.0 1.027393469477902 1.027393469477902 0.0 3.5 1.0061492039598166 1.0061492039598166 0.0 4.0 1.0010718572754302 1.0010718572754302 0.0 4.5 1.0001438753446412 1.0001438753446412 0.0 5.0 1.0000148674247078 1.0000148674247078 0.0 5.5 1.0000011846750532 1.0000011846750532 0.0 6.0 1.0000000729105996 1.0000000729105996 0.0 6.5 1.0000000034704237 1.0000000034704237 0.0 7.0 1.0000000001278861 1.0000000001278861 0.0 7.5 1.0000000000036515 1.0000000000036515 0.0 8.0 1.0000000000000808 1.0000000000000808 0.0 8.5 1.0000000000000016 1.0000000000000016 0.0 9.0 1.0 1.0 0.0 9.5 1.0 1.0 0.0 10.0 1.0 1.0 0.0 10.5 1.0 1.0 0.0 

Show Andrés Alejandro Plazas Malagón added a comment - - edited Chris's formula had a sign typo. Using: # Chris Waters's formula def sigmaClipCorrection(nSigClip): varFactor = 1.0 - ( 2 * nSigClip * norm.pdf(nSigClip)) / (norm.cdf(nSigClip) - norm.cdf( - nSigClip)) return varFactor #1.0 / np.sqrt(varFactor) the two formulas agree ( varFactor above is equivalent to 1/sqrt(f) , where f is the formula in Hirata&Choi2020): Sigma | Hirata&Choi | Waters | Fractional Residual (%) 2.0 1.2924216342232249 1.2924216342232249 0.0 2.5 1.0973860297375746 1.0973860297375746 0.0 3.0 1.027393469477902 1.027393469477902 0.0 3.5 1.0061492039598166 1.0061492039598166 0.0 4.0 1.0010718572754302 1.0010718572754302 0.0 4.5 1.0001438753446412 1.0001438753446412 0.0 5.0 1.0000148674247078 1.0000148674247078 0.0 5.5 1.0000011846750532 1.0000011846750532 0.0 6.0 1.0000000729105996 1.0000000729105996 0.0 6.5 1.0000000034704237 1.0000000034704237 0.0 7.0 1.0000000001278861 1.0000000001278861 0.0 7.5 1.0000000000036515 1.0000000000036515 0.0 8.0 1.0000000000000808 1.0000000000000808 0.0 8.5 1.0000000000000016 1.0000000000000016 0.0 9.0 1.0 1.0 0.0 9.5 1.0 1.0 0.0 10.0 1.0 1.0 0.0 10.5 1.0 1.0 0.0

One minor comment, just double-check the squaring and you're good to merge Show Merlin Fisher-Levine added a comment - One minor comment, just double-check the squaring and you're good to merge

#### People

Assignee: Andrés Alejandro Plazas Malagón
Reporter: Merlin Fisher-Levine
Reviewers:
Merlin Fisher-Levine
Watchers:
Andrés Alejandro Plazas Malagón, Merlin Fisher-Levine, Nicholas Mondrik, Seth Digel