I was able to get good results on basically the whole focal plane with FULLCOVARIANCE. All CCDs passed and only 3 amps fell out, the same three that failed with EXPAPPROX, including the two known dead amps. These results and plots are in /project/shared/BOT/rerun/cslage/PTC_LSSTCAM_FullCov_12673A . Below is a list of proposed changes. Some of these I think should go in, but some others we need to discuss:
(1) Most of the failures were caused by the following error. We were already testing for np.sum(w) == 0, which is basically the number of good pixels in the flat pair, at this step: https://github.com/lsst/cp_pipe/blob/f4cdebacd2b778bffd6c5ba5ba4fb438b4052ce2/python/lsst/cp/pipe/ptc.py#L683
But it turns out that if the number of good pixels is small (less than a few hundred or maybe a few thousand), then the FFT routine that calculates the covariances fails because some of the nPix values come out zero. So I changed the np.sum(w) test to be np.sum(w) < 10000. This number could perhaps be smaller, but if a flat pair has less than 10,000 good pixels, we probably don't want to include it anyway. This fixed most of the failures.
(2) I realized when we put in limits for the EXPAPPROX fit, we put in +/-100 for the 3rd parameter, because I thought this was the noise. But it turns out this is the noise^2. Since the noise can be as high as ~40, I changed this to +/-2000.
(3) One of the CCDs was still failing in the wres routine at this step: https://github.com/lsst/cp_pipe/blob/f4cdebacd2b778bffd6c5ba5ba4fb438b4052ce2/python/lsst/cp/pipe/astierCovPtcFit.py#L528. The failure was caused by self.maskMu being all NaNs. I put in a try/except, as follows. This clearly isn't a long-term fix, but fixed the problem. We need to understand how it gets to be all NaNs.
try:
|
maskedWeightedRes = weightedRes[self.maskMu]
|
except:
|
maskedWeightedRes = weightedRes * 0.0
|
(4) Many of the single amp failures were caused by the code just rejecting too many points in the iterative outlier rejection routine. I think this routine doesn't make sense. Once it has rejected a point, that point can never be recovered. What happens is illustrated in the following sketch . The first iteration, in red, rejects Pt 6 (correctly), but also rejects Pt 4 and Pt 5. The second iteration is in green, but Pt 4 and Pt 5, which are good fits, have already been rejected and can't be recovered. So I propose changing it so that points can be recovered if they are good fits in subsequent iterations. We need to discuss this.
. 
I'm concerned that the PTC code is becoming increasingly difficult to understand. I'm hopeful that the gen3 rewrite will help somewhat.
I'd also suggest rebasing the existing commits into a simpler set that clearly define what changes have been made. It might also be good to move the travis/lint change to either the beginning or end of the commit chain so the PTC commits are together.