Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-4071

testPsfDetermination broken due to NumPy behaviour change

    Details

    • Type: Bug
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: meas_algorithms
    • Labels:
      None

      Description

      Old NumPy behaviour (tested on 1.6.2):

      In [1]: import numpy
       
      In [2]: a = numpy.array([])
       
      In [3]: numpy.median(a)
      /usr/lib64/python2.6/site-packages/numpy/core/fromnumeric.py:2374: RuntimeWarning: invalid value encountered in double_scalars
        return mean(axis, dtype, out)
       
      Out[3]: nan
      

      New NumPy behaviour (1.10.0):

      In [1]: import numpy
       
      In [2]: a = numpy.array([])
       
      In [3]: numpy.median(a)
      [...]
      IndexError: index -1 is out of bounds for axis 0 with size 0
      

      This breaks testPsfDeterminer and testPsfDeterminerSubimage, e.g.:

      ERROR: testPsfDeterminerSubimage (__main__.SpatialModelPsfTestCase)
      Test the (PCA) psfDeterminer on subImages
      ----------------------------------------------------------------------
      Traceback (most recent call last):
        File "./testPsfDetermination.py", line 342, in testPsfDeterminerSubimage
          trimCatalogToImage(subExp, self.catalog))
        File "/Users/jds/Projects/Astronomy/LSST/src/meas_algorithms/python/lsst/meas/algorithms/objectSizeStarSelector.py", line 377, in selectStars
          widthStdAllowed=self._widthStdAllowed)
        File "/Users/jds/Projects/Astronomy/LSST/src/meas_algorithms/python/lsst/meas/algorithms/objectSizeStarSelector.py", line 195, in _kcenters
          centers[i] = func(yvec[clusterId == i])
        File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/lib/function_base.py", line 3084, in median
          overwrite_input=overwrite_input)
        File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/lib/function_base.py", line 2997, in _ureduce
          r = func(a, **kwargs)
        File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/lib/function_base.py", line 3138, in _median
          n = np.isnan(part[..., -1])
      IndexError: index -1 is out of bounds for axis 0 with size 0
      

        Attachments

          Issue Links

            Activity

            Hide
            swinbank John Swinbank added a comment - - edited

            Here's a simple fix. Suggestions as to whether there's a more efficient way to do this with NumPy magic welcome.

            Show
            swinbank John Swinbank added a comment - - edited Here's a simple fix . Suggestions as to whether there's a more efficient way to do this with NumPy magic welcome.
            Hide
            rowen Russell Owen added a comment - - edited

            It would be nice if median and mean had an argument that would make them return nan if provided with an empty vector, but I see no such thing, so I think your fix is the right way to go.

            Your code would be more efficient if you only computed the index array once. I also think it is a bit more more idiomatic to use any as a function instead of a method. Hence, something like this:

                    for i in range(nCluster):
                        indArr = clusterId == i
                        if numpy.any(indArr):
                            centers[i] = func(yvec[indArr])
                        else:
                            centers[i] = numpy.nan
            

            I admit that I don't find indArr = clusterId == i very readable, though. I'm not sure if this is any better: indArr = numpy.array(clusterId == i).

            Show
            rowen Russell Owen added a comment - - edited It would be nice if median and mean had an argument that would make them return nan if provided with an empty vector, but I see no such thing, so I think your fix is the right way to go. Your code would be more efficient if you only computed the index array once. I also think it is a bit more more idiomatic to use any as a function instead of a method. Hence, something like this: for i in range(nCluster): indArr = clusterId == i if numpy.any(indArr): centers[i] = func(yvec[indArr]) else: centers[i] = numpy.nan I admit that I don't find indArr = clusterId == i very readable, though. I'm not sure if this is any better: indArr = numpy.array(clusterId == i) .
            Hide
            swinbank John Swinbank added a comment -

            Thanks Russell Owen; I've actually already done something similar to your suggestion in response to Tim Jenness's comment on the PR. I'll switch it to use any as a function rather than a method – I think you're right that it's a bit neater, but I'm not really aware of a reason to prefer one version to another.

            Tim Jenness – were there any further issues or is this now reviewed?

            Show
            swinbank John Swinbank added a comment - Thanks Russell Owen ; I've actually already done something similar to your suggestion in response to Tim Jenness 's comment on the PR. I'll switch it to use any as a function rather than a method – I think you're right that it's a bit neater, but I'm not really aware of a reason to prefer one version to another. Tim Jenness – were there any further issues or is this now reviewed?
            Hide
            rowen Russell Owen added a comment -

            By the way, you are welcome to consider my comments a review. Thank you very much for fixing this so quickly.

            Show
            rowen Russell Owen added a comment - By the way, you are welcome to consider my comments a review. Thank you very much for fixing this so quickly.
            Hide
            rowen Russell Owen added a comment -

            Here are some reasons why I tend to prefer functions to methods in numpy:

            • Functions work on more data types
            • Not all functions have equivalent methods

            Also, in this particular case "any" is also a built in python function, and would probably do just as well here (but numpy.any might be a bit faster). For such generic behavior it is arguably a bit more "pythonic". e.g. in the sense of functions such as "len"

            Show
            rowen Russell Owen added a comment - Here are some reasons why I tend to prefer functions to methods in numpy: Functions work on more data types Not all functions have equivalent methods Also, in this particular case "any" is also a built in python function, and would probably do just as well here (but numpy.any might be a bit faster). For such generic behavior it is arguably a bit more "pythonic". e.g. in the sense of functions such as "len"
            Hide
            tjenness Tim Jenness added a comment -

            Looks good now. Thanks.

            Show
            tjenness Tim Jenness added a comment - Looks good now. Thanks.
            Hide
            swinbank John Swinbank added a comment -

            Thanks both for your comments. Now merged.

            Show
            swinbank John Swinbank added a comment - Thanks both for your comments. Now merged.

              People

              • Assignee:
                swinbank John Swinbank
                Reporter:
                swinbank John Swinbank
                Reviewers:
                Tim Jenness
                Watchers:
                John Swinbank, Russell Owen, Tim Jenness
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: