# ObjectSizeStarSelector can produce numpy warnings

XMLWordPrintable

#### Details

• Type: Bug
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
None
• Story Points:
2
• Sprint:
DRP X16-3
• Team:
Data Release Production

#### Description

ObjectSizeStarSelector can produce the following numpy warning:

 RuntimeWarning: invalid value encountered in less 

This occurs at the following point in the code:

  for i in range(nCluster):  # Only compute func if some points are available; otherwise, default to NaN.  pointsInCluster = (clusterId == i)  if numpy.any(pointsInCluster):  centers[i] = func(yvec[pointsInCluster]) 

where func has been assigned to numpy.mean. When I have seen this occur I have found that dist is an array of nan

I suggest that the star selector handle this situation more gracefully, e.g. by reporting an appropriate exception or handling the data in an appropriate way. If logging a message would be helpful, then please do that (and if RFC-154 is adopted, a log will be available).

One way to reproduce this is to run tests/testProcessCcd.py in pipe_tasks. However, I often see it when running processCcd.py on other data, as well.

#### Activity

Hide
Russell Owen added a comment - - edited

I feel that the warning messages from numpy are basically useless because the user has no idea where they are coming from nor what the problem is.

The following are all possible better ways to handle this, depending on what problem is causing the warning:

• If the warning indicates an error that prevents further processing then the code ought to raise an exception with a useful message.
• If the warning indicates an warning or error that does allow further processing then log a warning or error. Log messages can be much more informative than numpy warnings and log messages also contain a context that tells users where the message came from.
• If the warning is even less important than it can be safely suppressed. An example is computing statistics on an array containing a few NaNs; the result will be NaN and you can test for that. No need to also have a warning from numpy.

Also, some of these numpy warnings are being caused by not properly identifying problems before calling numpy (e.g. passing an empty array). If the error condition is easy to test before calling numpy then I suggest doing that. I doubt all such conditions can be caught in advance, but when they can, that's a good thing to do.

For logging: all tasks already contain a log attribute. If you are catching the issue in non-task code (such as a measurement plugin) then you can easily get a log object from pex_logging and log to that. I suggest creating it in the constructor so all methods of the plugin have access to it.

Show
Hide
Vishal Kasliwal [X] (Inactive) added a comment - - edited

Do we wanty to change numpy.mean -> numpy.nanmean and numpy.median -> numpy.nanmedian?

Show
Vishal Kasliwal [X] (Inactive) added a comment - - edited Do we wanty to change numpy.mean -> numpy.nanmean and numpy.median -> numpy.nanmedian?
Hide
Russell Owen added a comment -

Vishal Kasliwal [X] to answer your question about numpy.nanmean and nanmedian I think you need to understand what the code is doing. What do NaNs indicate? Should NaNs be ignored, should they cause the mean and median to be Nan (and if so, then what should happen?) or should they cause the algorithm to fail at that point?

Show
Russell Owen added a comment - Vishal Kasliwal [X] to answer your question about numpy.nanmean and nanmedian I think you need to understand what the code is doing. What do NaNs indicate? Should NaNs be ignored, should they cause the mean and median to be Nan (and if so, then what should happen?) or should they cause the algorithm to fail at that point?
Hide
Vishal Kasliwal [X] (Inactive) added a comment - - edited

Here's my understanding-
In the case of the SingleFrameVariancePlugin, we are computing the median of the variance over the heavy footrprint of a source. We check the mask array of the pixels in the heavy footprint of the source against the mask corresponding to flags that indicate potential bad sources (by default these are "DETECTED", "DETECTED_NEGATIVE", "BAD", "SAT). We only compute the median over all pixels that don't have those flags set. Thus we should never expect to get NaNs in the variance plane of the pixels in the heavy footprint that we are computing the median of. Therefore, we should not have to use numpy.nanmedian. In fact, we should not use numpy.nanmedian because if we do somehow end-up with one/some/all NaN values in the variance plane, something else has gone wrong and this is alerting us to that event.
Does that sound correct or am I misunderstanding the algorithm?
In conclusion: Keep numpy.median rather than numpy.nanmedian.

Show
Vishal Kasliwal [X] (Inactive) added a comment - - edited Here's my understanding- In the case of the SingleFrameVariancePlugin, we are computing the median of the variance over the heavy footrprint of a source. We check the mask array of the pixels in the heavy footprint of the source against the mask corresponding to flags that indicate potential bad sources (by default these are "DETECTED", "DETECTED_NEGATIVE", "BAD", "SAT). We only compute the median over all pixels that don't have those flags set. Thus we should never expect to get NaNs in the variance plane of the pixels in the heavy footprint that we are computing the median of. Therefore, we should not have to use numpy.nanmedian. In fact, we should not use numpy.nanmedian because if we do somehow end-up with one/some/all NaN values in the variance plane, something else has gone wrong and this is alerting us to that event. Does that sound correct or am I misunderstanding the algorithm? In conclusion: Keep numpy.median rather than numpy.nanmedian.
Hide
Vishal Kasliwal [X] (Inactive) added a comment -

Hi Pim,
Can you please review this ticket? There are two commits. The first commit is a simple fix to make the unit tests in testPsfDetermination.py use both StarSelector tasks in every test. I decided to add this work to the ticket after checking with Jim Bosch that ideally both StarSelectors should be used in the tests. The second commit is the work requested by this ticket. Thanks!

Show
Vishal Kasliwal [X] (Inactive) added a comment - Hi Pim, Can you please review this ticket? There are two commits. The first commit is a simple fix to make the unit tests in testPsfDetermination.py use both StarSelector tasks in every test. I decided to add this work to the ticket after checking with Jim Bosch that ideally both StarSelectors should be used in the tests. The second commit is the work requested by this ticket. Thanks!
Hide
Pim Schellart [X] (Inactive) added a comment -

Overall solution looks fine. Assuming Numpy warnings are really activated and we are sure that they really should be logged as warnings and not errors. Which is hard for me to judge.

Show
Pim Schellart [X] (Inactive) added a comment - Overall solution looks fine. Assuming Numpy warnings are really activated and we are sure that they really should be logged as warnings and not errors. Which is hard for me to judge.
Hide
Vishal Kasliwal [X] (Inactive) added a comment -

Merged to master....

Show
Vishal Kasliwal [X] (Inactive) added a comment - Merged to master....

#### People

Assignee:
Vishal Kasliwal [X] (Inactive)
Reporter:
Russell Owen
Reviewers:
Pim Schellart [X] (Inactive)
Watchers:
Jim Bosch, Paul Price, Pim Schellart [X] (Inactive), Russell Owen, Vishal Kasliwal [X] (Inactive)