Fix Version/s: None
Sprint:AP F21-2 (July)
The DPDD requires us to compute the chi squared of a point source model fit on difference imaging residuals. Develop a chi squared fit for the PSF model that can be run on the residuals which can be added to the source catalog. This may need to be run in the C++ layer at the point where the measurement is made.
Note: See discussion about this topic here: https://lsstc.slack.com/archives/C2JPMCF5X/p1624291520356400
I believe the relevant entries that define what we need to produce here are in the various source table definitions in the DPDD. For example:
psLnL: Natural log likelihood of the observed data given the point source model.
psChi2: chi2 statistic of the model fit.
psNdata: The number of data points (pixels) used to fit the model.
I believe that is the value discussed in the ticket description. As opposed to psFluxChi2, which is "chi2 statistic for the scatter of psFlux around psFluxMean.", is only computed for DiaObjects, and I believe already exists.
For this ticket, I will compute and save the chi2 and number of included pixels (this might already just be the `*_area` value). Those can be output to DPDD quantities on another ticket.
Whether we want to compute a log-likelihood here is a good question: typically that's just a variation on the reduced chi2, so may not be very informative without incorporating more knowledge of the pixel covariances.
I've attached a histogram of the normalized chi2s (chi2/npix, which should maybe be npix-1 but that doesn't really change the result) computed with the new code for the gen3_rc2_subset data that I ran through on my desktop. There's a definite per-filter difference, ranging from ~18 for HSC-G to ~40 with a long tail for HSC-Y. I'm not sure what to make of these results, assuming I got the math right. This will be something to think about during the review.
Krzysztof Findeisen: do you mind doing this small-ish review (~50 lines) of mixed C++ and python (the test code)? I don't know if there's a more optimal way to do the C++ calculation.
Eric Bellm: can you please ponder the plot I've uploaded and see if you agree with my math on the ticket? You can look at the calculation in the python test code if you don't want to bother with the C++.
It looks like the new calculations here increase the PsfFluxAlgorithm runtime by about 10% (roughly 52ms->58ms to run on 10k PSFs on my desktop), which is probably irrelevant in the grand scheme of things.
Post-rebase and ap_association fix Jenkins: https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/34664/pipeline
Hi John Parejko, indeed there is a math error: for chi-squared we want sum( ((yi-fi)/sigma_i)**2.). The current code has sum( ((yi-fi)/sigma_i)) since you have taken the square root of the variance plane.
Attached an updated plot, after fixing the sqrt bug that Eric reported. That looks much more like what I'd expect. Note that these plots are just for DRP output, not diffims.
I wonder if it could be computed in https://github.com/lsst/meas_base/blob/master/src/PsfFlux.cc as part of PsfFluxAlgorithm::measure.