Fix Version/s: None
`ObjectSizeStarSelector` can produce the following numpy warning:
RuntimeWarning: invalid value encountered in less
This occurs at the following point in the code:
for i in range(nCluster):
# Only compute func if some points are available; otherwise, default to NaN.
pointsInCluster = (clusterId == i)
centers[i] = func(yvec[pointsInCluster])
where `func` has been assigned to `numpy.mean`. When I have seen this occur I have found that `dist` is an array of `nan`
I suggest that the star selector handle this situation more gracefully, e.g. by reporting an appropriate exception or handling the data in an appropriate way. If logging a message would be helpful, then please do that (and if
RFC-154 is adopted, a log will be available).
One way to reproduce this is to run `tests/testProcessCcd.py` in `pipe_tasks`. However, I often see it when running `processCcd.py` on other data, as well.
- mentioned in
Do we wanty to change numpy.mean -> numpy.nanmean and numpy.median -> numpy.nanmedian?
Vishal Kasliwal [X] to answer your question about numpy.nanmean and nanmedian I think you need to understand what the code is doing. What do NaNs indicate? Should NaNs be ignored, should they cause the mean and median to be Nan (and if so, then what should happen?) or should they cause the algorithm to fail at that point?
Here's my understanding-
In the case of the SingleFrameVariancePlugin, we are computing the median of the variance over the heavy footrprint of a source. We check the mask array of the pixels in the heavy footprint of the source against the mask corresponding to flags that indicate potential bad sources (by default these are "DETECTED", "DETECTED_NEGATIVE", "BAD", "SAT). We only compute the median over all pixels that don't have those flags set. Thus we should never expect to get NaNs in the variance plane of the pixels in the heavy footprint that we are computing the median of. Therefore, we should not have to use numpy.nanmedian. In fact, we should not use numpy.nanmedian because if we do somehow end-up with one/some/all NaN values in the variance plane, something else has gone wrong and this is alerting us to that event.
Does that sound correct or am I misunderstanding the algorithm?
In conclusion: Keep numpy.median rather than numpy.nanmedian.
Can you please review this ticket? There are two commits. The first commit is a simple fix to make the unit tests in testPsfDetermination.py use both StarSelector tasks in every test. I decided to add this work to the ticket after checking with Jim Bosch that ideally both StarSelectors should be used in the tests. The second commit is the work requested by this ticket. Thanks!
Overall solution looks fine. Assuming Numpy warnings are really activated and we are sure that they really should be logged as warnings and not errors. Which is hard for me to judge.
I was asked for more information about what I want, so here goes....
I feel that the warning messages from numpy are basically useless because the user has no idea where they are coming from nor what the problem is.
The following are all possible better ways to handle this, depending on what problem is causing the warning:
Also, some of these numpy warnings are being caused by not properly identifying problems before calling numpy (e.g. passing an empty array). If the error condition is easy to test before calling numpy then I suggest doing that. I doubt all such conditions can be caught in advance, but when they can, that's a good thing to do.
For logging: all tasks already contain a log attribute. If you are catching the issue in non-task code (such as a measurement plugin) then you can easily get a log object from pex_logging and log to that. I suggest creating it in the constructor so all methods of the plugin have access to it.