Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-5428

ObjectSizeStarSelector can produce numpy warnings

    Details

    • Type: Bug
    • Status: Done
    • Priority: Major
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: meas_algorithms
    • Labels:
      None
    • Templates:
    • Story Points:
      2
    • Sprint:
      DRP X16-3
    • Team:
      Data Release Production

      Description

      `ObjectSizeStarSelector` can produce the following numpy warning:

      RuntimeWarning: invalid value encountered in less
      

      This occurs at the following point in the code:

              for i in range(nCluster):
                  # Only compute func if some points are available; otherwise, default to NaN.
                  pointsInCluster = (clusterId == i)
                  if numpy.any(pointsInCluster):
                      centers[i] = func(yvec[pointsInCluster])
      

      where `func` has been assigned to `numpy.mean`. When I have seen this occur I have found that `dist` is an array of `nan`

      I suggest that the star selector handle this situation more gracefully, e.g. by reporting an appropriate exception or handling the data in an appropriate way. If logging a message would be helpful, then please do that (and if RFC-154 is adopted, a log will be available).

      One way to reproduce this is to run `tests/testProcessCcd.py` in `pipe_tasks`. However, I often see it when running `processCcd.py` on other data, as well.

        Issue Links

          Activity

          Hide
          rowen Russell Owen added a comment - - edited

          I was asked for more information about what I want, so here goes....

          I feel that the warning messages from numpy are basically useless because the user has no idea where they are coming from nor what the problem is.

          The following are all possible better ways to handle this, depending on what problem is causing the warning:

          • If the warning indicates an error that prevents further processing then the code ought to raise an exception with a useful message.
          • If the warning indicates an warning or error that does allow further processing then log a warning or error. Log messages can be much more informative than numpy warnings and log messages also contain a context that tells users where the message came from.
          • If the warning is even less important than it can be safely suppressed. An example is computing statistics on an array containing a few NaNs; the result will be NaN and you can test for that. No need to also have a warning from numpy.

          Also, some of these numpy warnings are being caused by not properly identifying problems before calling numpy (e.g. passing an empty array). If the error condition is easy to test before calling numpy then I suggest doing that. I doubt all such conditions can be caught in advance, but when they can, that's a good thing to do.

          For logging: all tasks already contain a log attribute. If you are catching the issue in non-task code (such as a measurement plugin) then you can easily get a log object from pex_logging and log to that. I suggest creating it in the constructor so all methods of the plugin have access to it.

          Show
          rowen Russell Owen added a comment - - edited I was asked for more information about what I want, so here goes.... I feel that the warning messages from numpy are basically useless because the user has no idea where they are coming from nor what the problem is. The following are all possible better ways to handle this, depending on what problem is causing the warning: If the warning indicates an error that prevents further processing then the code ought to raise an exception with a useful message. If the warning indicates an warning or error that does allow further processing then log a warning or error. Log messages can be much more informative than numpy warnings and log messages also contain a context that tells users where the message came from. If the warning is even less important than it can be safely suppressed. An example is computing statistics on an array containing a few NaNs; the result will be NaN and you can test for that. No need to also have a warning from numpy. Also, some of these numpy warnings are being caused by not properly identifying problems before calling numpy (e.g. passing an empty array). If the error condition is easy to test before calling numpy then I suggest doing that. I doubt all such conditions can be caught in advance, but when they can, that's a good thing to do. For logging: all tasks already contain a log attribute. If you are catching the issue in non-task code (such as a measurement plugin) then you can easily get a log object from pex_logging and log to that. I suggest creating it in the constructor so all methods of the plugin have access to it.
          Hide
          vpk24 Vishal Kasliwal [X] (Inactive) added a comment - - edited

          Do we wanty to change numpy.mean -> numpy.nanmean and numpy.median -> numpy.nanmedian?

          Show
          vpk24 Vishal Kasliwal [X] (Inactive) added a comment - - edited Do we wanty to change numpy.mean -> numpy.nanmean and numpy.median -> numpy.nanmedian?
          Hide
          rowen Russell Owen added a comment -

          Vishal Kasliwal [X] to answer your question about numpy.nanmean and nanmedian I think you need to understand what the code is doing. What do NaNs indicate? Should NaNs be ignored, should they cause the mean and median to be Nan (and if so, then what should happen?) or should they cause the algorithm to fail at that point?

          Show
          rowen Russell Owen added a comment - Vishal Kasliwal [X] to answer your question about numpy.nanmean and nanmedian I think you need to understand what the code is doing. What do NaNs indicate? Should NaNs be ignored, should they cause the mean and median to be Nan (and if so, then what should happen?) or should they cause the algorithm to fail at that point?
          Hide
          vpk24 Vishal Kasliwal [X] (Inactive) added a comment - - edited

          Here's my understanding-
          In the case of the SingleFrameVariancePlugin, we are computing the median of the variance over the heavy footrprint of a source. We check the mask array of the pixels in the heavy footprint of the source against the mask corresponding to flags that indicate potential bad sources (by default these are "DETECTED", "DETECTED_NEGATIVE", "BAD", "SAT). We only compute the median over all pixels that don't have those flags set. Thus we should never expect to get NaNs in the variance plane of the pixels in the heavy footprint that we are computing the median of. Therefore, we should not have to use numpy.nanmedian. In fact, we should not use numpy.nanmedian because if we do somehow end-up with one/some/all NaN values in the variance plane, something else has gone wrong and this is alerting us to that event.
          Does that sound correct or am I misunderstanding the algorithm?
          In conclusion: Keep numpy.median rather than numpy.nanmedian.

          Show
          vpk24 Vishal Kasliwal [X] (Inactive) added a comment - - edited Here's my understanding- In the case of the SingleFrameVariancePlugin, we are computing the median of the variance over the heavy footrprint of a source. We check the mask array of the pixels in the heavy footprint of the source against the mask corresponding to flags that indicate potential bad sources (by default these are "DETECTED", "DETECTED_NEGATIVE", "BAD", "SAT). We only compute the median over all pixels that don't have those flags set. Thus we should never expect to get NaNs in the variance plane of the pixels in the heavy footprint that we are computing the median of. Therefore, we should not have to use numpy.nanmedian. In fact, we should not use numpy.nanmedian because if we do somehow end-up with one/some/all NaN values in the variance plane, something else has gone wrong and this is alerting us to that event. Does that sound correct or am I misunderstanding the algorithm? In conclusion: Keep numpy.median rather than numpy.nanmedian.
          Hide
          vpk24 Vishal Kasliwal [X] (Inactive) added a comment -

          Hi Pim,
          Can you please review this ticket? There are two commits. The first commit is a simple fix to make the unit tests in testPsfDetermination.py use both StarSelector tasks in every test. I decided to add this work to the ticket after checking with Jim Bosch that ideally both StarSelectors should be used in the tests. The second commit is the work requested by this ticket. Thanks!

          Show
          vpk24 Vishal Kasliwal [X] (Inactive) added a comment - Hi Pim, Can you please review this ticket? There are two commits. The first commit is a simple fix to make the unit tests in testPsfDetermination.py use both StarSelector tasks in every test. I decided to add this work to the ticket after checking with Jim Bosch that ideally both StarSelectors should be used in the tests. The second commit is the work requested by this ticket. Thanks!
          Hide
          pschella Pim Schellart added a comment -

          Overall solution looks fine. Assuming Numpy warnings are really activated and we are sure that they really should be logged as warnings and not errors. Which is hard for me to judge.

          Show
          pschella Pim Schellart added a comment - Overall solution looks fine. Assuming Numpy warnings are really activated and we are sure that they really should be logged as warnings and not errors. Which is hard for me to judge.
          Hide
          vpk24 Vishal Kasliwal [X] (Inactive) added a comment -

          Merged to master....

          Show
          vpk24 Vishal Kasliwal [X] (Inactive) added a comment - Merged to master....

            People

            • Assignee:
              vpk24 Vishal Kasliwal [X] (Inactive)
              Reporter:
              rowen Russell Owen
              Reviewers:
              Pim Schellart
              Watchers:
              Jim Bosch, Paul Price, Pim Schellart, Russell Owen, Vishal Kasliwal [X] (Inactive)
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development

                  Agile