# Add option to PTC task to correct for sigma clipping bias

XMLWordPrintable

#### Details

• Type: Story
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
None
• Story Points:
3
• Team:
Data Release Production
• Urgent?:
No

#### Description

When sigma-clipping to calculate the mean and variances, biases are potentially introduced at both the high and low flux ends, and should be corrected for somehow.

It's not clear how to do this well, but this ticket notes at least considering how we should approach it.

#### Attachments

1. correction_factor_hiarata_and_choi_BFE_2020.png
16 kB
2. screenshot-1.png
10 kB

#### Activity

Hide
Andrés Alejandro Plazas Malagón added a comment -

Comparing the two formulas, there are differences > 1% when sigma < 2.5:

 print ("Sigma | Hirata&Choi | Waters | Fractional Residual (%)") for sigma in nSigmaClip:  v1 = 1./np.sqrt(f(sigma))  v2 = sigmaClipCorrection(sigma)  print (sigma, 1./np.sqrt(f(sigma) ), sigmaClipCorrection(sigma), 100*(v1/v2 - 1)) 

 Sigma | Hirata&Choi | Waters | Fractional Residual (%) 2.0 1.2924216342232249 1.2262586964500768 5.395512216523701 2.5 1.0973860297375746 1.0887436390646081 0.7937948257857652 3.0 1.027393469477902 1.0266630753374586 0.07114253526683267 3.5 1.0061492039598166 1.0061116223474766 0.0037353322936750644 4.0 1.0010718572754302 1.0010707096275262 0.00011464204205946515 4.5 1.0001438753446412 1.0001438546475043 2.0694159941214707e-06 5.0 1.0000148674247078 1.0000148672036708 2.2103363583880764e-08 5.5 1.0000011846750532 1.0000011846736496 1.403543947731123e-10 6.0 1.0000000729105996 1.0000000729105945 5.10702591327572e-13 6.5 1.0000000034704237 1.0000000034704235 2.220446049250313e-14 7.0 1.0000000001278861 1.0000000001278861 0.0 7.5 1.0000000000036515 1.0000000000036515 0.0 8.0 1.0000000000000808 1.0000000000000808 0.0 8.5 1.0000000000000016 1.0000000000000013 2.220446049250313e-14 9.0 1.0 1.0 0.0 9.5 1.0 1.0 0.0 10.0 1.0 1.0 0.0 10.5 1.0 1.0 0.0 

Hirata&Choi2020 Hirata and Choi ignore second-order terms and higher in the correlation coefficient (see formula above).

 # Hirata and Choi formula  def f(nSigClip):  epsilon = 1.0 - norm.cdf(nSigClip)  return (1 - np.sqrt(2/np.pi)*((nSigClip*np.exp(-nSigClip**2/2))/(1 - 2*epsilon)))**2 

 # Chris Waters's formula def sigmaClipCorrection(nSigClip):  varFactor = 1.0 + (2 * nSigClip * norm.pdf(nSigClip)) / (norm.cdf(nSigClip) - norm.cdf(-nSigClip))  return varFactor #1.0 / np.sqrt(varFactor) 

Show
Andrés Alejandro Plazas Malagón added a comment - Comparing the two formulas, there are differences > 1% when sigma < 2.5: print ( "Sigma | Hirata&Choi | Waters | Fractional Residual (%)" ) for sigma in nSigmaClip: v1 = 1. / np.sqrt(f(sigma)) v2 = sigmaClipCorrection(sigma) print (sigma, 1. / np.sqrt(f(sigma) ), sigmaClipCorrection(sigma), 100 * (v1 / v2 - 1 )) Sigma | Hirata&Choi | Waters | Fractional Residual (%) 2.0 1.2924216342232249 1.2262586964500768 5.395512216523701 2.5 1.0973860297375746 1.0887436390646081 0.7937948257857652 3.0 1.027393469477902 1.0266630753374586 0.07114253526683267 3.5 1.0061492039598166 1.0061116223474766 0.0037353322936750644 4.0 1.0010718572754302 1.0010707096275262 0.00011464204205946515 4.5 1.0001438753446412 1.0001438546475043 2.0694159941214707e-06 5.0 1.0000148674247078 1.0000148672036708 2.2103363583880764e-08 5.5 1.0000011846750532 1.0000011846736496 1.403543947731123e-10 6.0 1.0000000729105996 1.0000000729105945 5.10702591327572e-13 6.5 1.0000000034704237 1.0000000034704235 2.220446049250313e-14 7.0 1.0000000001278861 1.0000000001278861 0.0 7.5 1.0000000000036515 1.0000000000036515 0.0 8.0 1.0000000000000808 1.0000000000000808 0.0 8.5 1.0000000000000016 1.0000000000000013 2.220446049250313e-14 9.0 1.0 1.0 0.0 9.5 1.0 1.0 0.0 10.0 1.0 1.0 0.0 10.5 1.0 1.0 0.0 Hirata&Choi2020 Hirata and Choi ignore second-order terms and higher in the correlation coefficient (see formula above). # Hirata and Choi formula def f(nSigClip): epsilon = 1.0 - norm.cdf(nSigClip) return ( 1 - np.sqrt( 2 / np.pi) * ((nSigClip * np.exp( - nSigClip * * 2 / 2 )) / ( 1 - 2 * epsilon))) * * 2 # Chris Waters's formula def sigmaClipCorrection(nSigClip): varFactor = 1.0 + ( 2 * nSigClip * norm.pdf(nSigClip)) / (norm.cdf(nSigClip) - norm.cdf( - nSigClip)) return varFactor #1.0 / np.sqrt(varFactor)
Hide
Andrés Alejandro Plazas Malagón added a comment - - edited

Chris Waters points out that the difference at low sigma appears to be betwee his use of f ~ 1/ (1 + x) and H&C20's use of f ~ 1 - x, for small x.

We'll stick with the existing code that Chris has in the crosstalk algorithm (moving it to utils and then import it from there).

Show
Andrés Alejandro Plazas Malagón added a comment - - edited Chris Waters points out that the difference at low sigma appears to be betwee his use of f ~ 1/ (1 + x) and H&C20's use of f ~ 1 - x , for small x . We'll stick with the existing code that Chris has in the crosstalk algorithm (moving it to utils and then import it from there).
Hide
Andrés Alejandro Plazas Malagón added a comment - - edited

Chris's formula had a sign typo. Using:

 # Chris Waters's formula def sigmaClipCorrection(nSigClip):  varFactor = 1.0 - (2 * nSigClip * norm.pdf(nSigClip)) / (norm.cdf(nSigClip) - norm.cdf(-nSigClip))  return varFactor #1.0 / np.sqrt(varFactor) 

the two formulas agree (varFactor above is equivalent to 1/sqrt(f), where f is the formula in Hirata&Choi2020):

 Sigma | Hirata&Choi | Waters | Fractional Residual (%) 2.0 1.2924216342232249 1.2924216342232249 0.0 2.5 1.0973860297375746 1.0973860297375746 0.0 3.0 1.027393469477902 1.027393469477902 0.0 3.5 1.0061492039598166 1.0061492039598166 0.0 4.0 1.0010718572754302 1.0010718572754302 0.0 4.5 1.0001438753446412 1.0001438753446412 0.0 5.0 1.0000148674247078 1.0000148674247078 0.0 5.5 1.0000011846750532 1.0000011846750532 0.0 6.0 1.0000000729105996 1.0000000729105996 0.0 6.5 1.0000000034704237 1.0000000034704237 0.0 7.0 1.0000000001278861 1.0000000001278861 0.0 7.5 1.0000000000036515 1.0000000000036515 0.0 8.0 1.0000000000000808 1.0000000000000808 0.0 8.5 1.0000000000000016 1.0000000000000016 0.0 9.0 1.0 1.0 0.0 9.5 1.0 1.0 0.0 10.0 1.0 1.0 0.0 10.5 1.0 1.0 0.0 

Show
Andrés Alejandro Plazas Malagón added a comment - - edited Chris's formula had a sign typo. Using: # Chris Waters's formula def sigmaClipCorrection(nSigClip): varFactor = 1.0 - ( 2 * nSigClip * norm.pdf(nSigClip)) / (norm.cdf(nSigClip) - norm.cdf( - nSigClip)) return varFactor #1.0 / np.sqrt(varFactor) the two formulas agree ( varFactor above is equivalent to 1/sqrt(f) , where f is the formula in Hirata&Choi2020): Sigma | Hirata&Choi | Waters | Fractional Residual (%) 2.0 1.2924216342232249 1.2924216342232249 0.0 2.5 1.0973860297375746 1.0973860297375746 0.0 3.0 1.027393469477902 1.027393469477902 0.0 3.5 1.0061492039598166 1.0061492039598166 0.0 4.0 1.0010718572754302 1.0010718572754302 0.0 4.5 1.0001438753446412 1.0001438753446412 0.0 5.0 1.0000148674247078 1.0000148674247078 0.0 5.5 1.0000011846750532 1.0000011846750532 0.0 6.0 1.0000000729105996 1.0000000729105996 0.0 6.5 1.0000000034704237 1.0000000034704237 0.0 7.0 1.0000000001278861 1.0000000001278861 0.0 7.5 1.0000000000036515 1.0000000000036515 0.0 8.0 1.0000000000000808 1.0000000000000808 0.0 8.5 1.0000000000000016 1.0000000000000016 0.0 9.0 1.0 1.0 0.0 9.5 1.0 1.0 0.0 10.0 1.0 1.0 0.0 10.5 1.0 1.0 0.0
Hide
Merlin Fisher-Levine added a comment -

One minor comment, just double-check the squaring and you're good to merge

Show
Merlin Fisher-Levine added a comment - One minor comment, just double-check the squaring and you're good to merge
Hide
Andrés Alejandro Plazas Malagón added a comment -
Show
Andrés Alejandro Plazas Malagón added a comment - https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/33961/pipeline

#### People

Assignee:
Andrés Alejandro Plazas Malagón
Reporter:
Merlin Fisher-Levine
Reviewers:
Merlin Fisher-Levine
Watchers:
Andrés Alejandro Plazas Malagón, Merlin Fisher-Levine, Nicholas Mondrik, Seth Digel