Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-9561

Improve Monotonicity Operator

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: meas_deblender
    • Labels:
      None

      Description

      Preliminary testing of the NMF deblender shows that the radial monotonicity operator as designed does not work as expected. Peter Melchior and I have verified that the monotonicity operator itself is built properly based on our earlier design, which uses a single reference pixel (the one that lies closest to a radial line from the peak to the current pixel) for each pixel.

      Based on a pixels position from the peak, it lies in one of 8 octants that determines which pixel it will use as a reference, but it appears that the transition between reference pixel positions is causing the monotonicity operator to generate weird streaks in the deblended objects that are unphysical.

      We are redesigning the monotonicity operator to use the weight of all three neighboring pixels that are closer to the peak as a reference, which we hope will fix this issue.

        Attachments

        1. blend.period.png
          blend.period.png
          227 kB
        2. blend.png
          blend.png
          193 kB
        3. color.snowflake.png
          color.snowflake.png
          83 kB
        4. ellipse.period.png
          ellipse.period.png
          197 kB
        5. ellipse.png
          ellipse.png
          133 kB
        6. masked.snowflake.png
          masked.snowflake.png
          153 kB
        7. observation.period.png
          observation.period.png
          221 kB
        8. observation.png
          observation.png
          105 kB

          Activity

          Hide
          fred3m Fred Moolekamp added a comment - - edited

          DM-9172 involves the creation of a testing framework to debug and evaluate the performance of both the old and new deblenders. Optimizing the monotonicity operator was necessary to complete DM-9172 in order for the tests to run in a reasonable amount of time, so the code is included on that branch.

          Below is the code for the optimized monotonicity operator, which may still change before this ticket is completed due to unexpected consequences of the current monotonicity parameter. This includes code to make more general sparse arrays with 8 diagonals (one for each neighbor), which will likely be used in the creation of other operators.

          def getOffsets(width):
              """Get the offset and slices for a sparse band diagonal array
              
              For an operator that interacts with its neighbors we want a band diagonal matrix,
              where each row describes the 8 pixels that are neighbors for the reference pixel
              (the diagonal). Regardless of the operator, these 8 bands are always the same,
              so we make a utility function that returns the offsets (passed to scipy.sparse.diags).
              
              See `diagonalizeArray` for more on the slices and format of the array used to create
              NxN operators that act on a data vector.
              """
              offsets = [-width-1, -width, -width+1, -1, 1, width-1, width, width+1]
              slices = [slice(None, s) if s<0 else slice(s, None) for s in offsets]
              slicesInv = [slice(-s, None) if s<0 else slice(None, -s) for s in offsets]
              return offsets, slices, slicesInv
           
          def diagonalizeArray(arr, shape=None, dtype=np.float64):
              """Convert an array to a matrix that compares each pixel to its neighbors
              
              Given an array with length N, create an 8xN array, where each row will be a
              diagonal in a diagonalized array. Each column in this matrix is a row in the larger
              NxN matrix used for an operator, except that this 2D array only contains the values
              used to create the bands in the band diagonal matrix.
              
              Because the off-diagonal bands have less than N elements, ``getOffsets`` is used to
              create a mask that will set the elements of the array that are outside of the matrix to zero.
              
              ``arr`` is the vector to diagonalize, for example the distance from each pixel to the peak,
              or the angle of the vector to the peak.
              
              ``shape`` is the shape of the original image.
              """
              if shape is None:
                  height, width = arr.shape
                  data = arr.flatten()
              elif len(arr.shape)==1:
                  height, width = shape
                  data = np.copy(arr)
              else:
                  raise ValueError("Expected either a 2D array or a 1D array and a shape")
              size = width * height
              
              # We hard code 8 rows, since each row corresponds to a neighbor
              # of each pixel.
              diagonals = np.zeros((8, size), dtype=dtype)
              mask = np.ones((8, size), dtype=bool)
              offsets, slices, slicesInv = getOffsets(width)
              for n, s in enumerate(slices):
                  diagonals[n][slicesInv[n]] = data[s]
                  mask[n][slicesInv[n]] = 0
              
              # Create a mask to hide false neighbors for pixels on the edge
              # (for example, a pixel on the left edge should not be connected to the
              # pixel to its immediate left in the flattened vector, since that pixel
              # is actual the far right pixel on the row above it).
              mask[0][np.arange(1,height)*width] = 1
              mask[2][np.arange(height)*width-1] = 1
              mask[3][np.arange(1,height)*width] = 1
              mask[4][np.arange(1,height)*width-1] = 1
              mask[5][np.arange(height)*width] = 1
              mask[7][np.arange(1,height-1)*width-1] = 1
              
              return diagonals, mask
           
          def diagonalsToSparse(diagonals, shape, dtype=np.float64):
              """Convert a diagonalized array into a sparse diagonal matrix
              
              ``diagonalizeArray`` creates an 8xN array representing the bands that describe the
              interactions of a pixel with its neighbors. This function takes that 8xN array and converts
              it into a sparse diagonal matrix.
              
              See `diagonalizeArray` for the details of the 8xN array. 
              """
              height, width = shape
              offsets, slices, slicesInv = getOffsets(width)
              diags = [diag[slicesInv[n]] for n, diag in enumerate(diagonals)]
              
              # This block hides false neighbors for the edge pixels (see comments in diagonalizeArray code)
              # For now we assume that the mask in diagonalizeArray has already been applied, making these
              # lines redundant and unecessary, but if that changes in the future we can uncomment them
              #diags[0][np.arange(1,height-1)*width-1] = 0
              #diags[2][np.arange(height)*width] = 0
              #diags[3][np.arange(1,height)*width-1] = 0
              #diags[4][np.arange(1,height)*width-1] = 0
              #diags[5][np.arange(height)*width] = 0
              #diags[7][np.arange(1,height-1)*width-1] = 0
              
              diagonalArr = sparse.diags(diags, offsets, dtype=dtype)
              return diagonalArr
           
          def getRadialMonotonicOp(shape, px, py, useNearest=True, minGradient=.9):
              """Create an operator to constrain radial monotonicity
              
              This version of the radial monotonicity operator selects all of the pixels closer to the peak
              for each pixel and weights their flux based on their alignment with a vector from the pixel
              to the peak. In order to quickly create this using sparse matrices, its construction is a bit opaque.
              """
              # Calculate the distance between each pixel and the peak
              size = shape[0]*shape[1]
              x = np.arange(shape[1])
              y = np.arange(shape[0])
              X,Y = np.meshgrid(x,y)
              X = X - px
              Y = Y - py
              distance = np.sqrt(X**2+Y**2)
              
              # Find each pixels neighbors further from the peak and mark them as invalid
              # (to be removed later)
              distArr, mask = diagonalizeArray(distance, dtype=np.float64)
              relativeDist = (distance.flatten()[:,None]-distArr.T).T
              invalidPix = relativeDist<=0
              
              # Calculate the angle between each pixel and the x axis, relative to the peak position
              # (also avoid dividing by zero and set the tan(infinity) pixel values to pi/2 manually)
              inf = X==0
              tX = X.copy()
              tX[inf] = 1
              angles = np.arctan2(-Y,-tX)
              angles[inf&(Y!=0)] = 0.5*np.pi*np.sign(angles[inf&(Y!=0)])
              
              # Calcualte the angle between each pixel and it's neighbors
              xArr, m = diagonalizeArray(X)
              yArr, m = diagonalizeArray(Y)
              dx = (xArr.T-X.flatten()[:, None]).T
              dy = (yArr.T-Y.flatten()[:, None]).T
              # Avoid dividing by zero and set the tan(infinity) pixel values to pi/2 manually
              inf = dx==0
              dx[inf] = 1
              relativeAngles = np.arctan2(dy,dx)
              relativeAngles[inf&(dy!=0)] = 0.5*np.pi*np.sign(relativeAngles[inf&(dy!=0)])
              
              # Find the difference between each pixels angle with the peak
              # and the relative angles to its neighbors, and take the
              # cos to find its neighbors weight
              dAngles = (angles.flatten()[:, None]-relativeAngles.T).T
              cosWeight = np.cos(dAngles)
              # Mask edge pixels, array elements outside the operator (for offdiagonal bands with < N elements),
              # and neighbors further from the peak than the reference pixel
              cosWeight[invalidPix] = 0
              cosWeight[mask] = 0
              
              if useNearest:
                  # Only use a single pixel most in line with peak
                  cosNorm = np.zeros_like(cosWeight)
                  columnIndices =  np.arange(cosWeight.shape[1])
                  maxIndices = np.argmax(cosWeight, axis=0)
                  indices = maxIndices*cosNorm.shape[1]+columnIndices
                  indices = np.unravel_index(indices, cosNorm.shape)
                  cosNorm[indices] = minGradient
                  # Remove the reference for the peak pixel
                  cosNorm[:,px+py*shape[1]] = 0
              else:
                  # Normalize the cos weights for each pixel
                  normalize = np.sum(cosWeight, axis=0)
                  normalize[normalize==0] = 1
                  cosNorm = (cosWeight.T/normalize[:,None]).T
                  cosNorm[mask] = 0
              cosArr = diagonalsToSparse(cosNorm, shape)
              
              # The identity with the peak pixel removed represents the reference pixels
              diagonal = np.ones(size)
              diagonal[px+py*shape[1]] = -1
              monotonic = cosArr-sparse.diags(diagonal)
              
              return monotonic.tocoo()
          

          Show
          fred3m Fred Moolekamp added a comment - - edited DM-9172 involves the creation of a testing framework to debug and evaluate the performance of both the old and new deblenders. Optimizing the monotonicity operator was necessary to complete DM-9172 in order for the tests to run in a reasonable amount of time, so the code is included on that branch. Below is the code for the optimized monotonicity operator, which may still change before this ticket is completed due to unexpected consequences of the current monotonicity parameter. This includes code to make more general sparse arrays with 8 diagonals (one for each neighbor), which will likely be used in the creation of other operators. def getOffsets(width): """Get the offset and slices for a sparse band diagonal array For an operator that interacts with its neighbors we want a band diagonal matrix, where each row describes the 8 pixels that are neighbors for the reference pixel (the diagonal). Regardless of the operator, these 8 bands are always the same, so we make a utility function that returns the offsets (passed to scipy.sparse.diags). See `diagonalizeArray` for more on the slices and format of the array used to create NxN operators that act on a data vector. """ offsets = [ - width - 1 , - width, - width + 1 , - 1 , 1 , width - 1 , width, width + 1 ] slices = [ slice ( None , s) if s< 0 else slice (s, None ) for s in offsets] slicesInv = [ slice ( - s, None ) if s< 0 else slice ( None , - s) for s in offsets] return offsets, slices, slicesInv   def diagonalizeArray(arr, shape = None , dtype = np.float64): """Convert an array to a matrix that compares each pixel to its neighbors Given an array with length N, create an 8xN array, where each row will be a diagonal in a diagonalized array. Each column in this matrix is a row in the larger NxN matrix used for an operator, except that this 2D array only contains the values used to create the bands in the band diagonal matrix. Because the off-diagonal bands have less than N elements, ``getOffsets`` is used to create a mask that will set the elements of the array that are outside of the matrix to zero. ``arr`` is the vector to diagonalize, for example the distance from each pixel to the peak, or the angle of the vector to the peak. ``shape`` is the shape of the original image. """ if shape is None : height, width = arr.shape data = arr.flatten() elif len (arr.shape) = = 1 : height, width = shape data = np.copy(arr) else : raise ValueError( "Expected either a 2D array or a 1D array and a shape" ) size = width * height # We hard code 8 rows, since each row corresponds to a neighbor # of each pixel. diagonals = np.zeros(( 8 , size), dtype = dtype) mask = np.ones(( 8 , size), dtype = bool ) offsets, slices, slicesInv = getOffsets(width) for n, s in enumerate (slices): diagonals[n][slicesInv[n]] = data[s] mask[n][slicesInv[n]] = 0 # Create a mask to hide false neighbors for pixels on the edge # (for example, a pixel on the left edge should not be connected to the # pixel to its immediate left in the flattened vector, since that pixel # is actual the far right pixel on the row above it). mask[ 0 ][np.arange( 1 ,height) * width] = 1 mask[ 2 ][np.arange(height) * width - 1 ] = 1 mask[ 3 ][np.arange( 1 ,height) * width] = 1 mask[ 4 ][np.arange( 1 ,height) * width - 1 ] = 1 mask[ 5 ][np.arange(height) * width] = 1 mask[ 7 ][np.arange( 1 ,height - 1 ) * width - 1 ] = 1 return diagonals, mask   def diagonalsToSparse(diagonals, shape, dtype = np.float64): """Convert a diagonalized array into a sparse diagonal matrix ``diagonalizeArray`` creates an 8xN array representing the bands that describe the interactions of a pixel with its neighbors. This function takes that 8xN array and converts it into a sparse diagonal matrix. See `diagonalizeArray` for the details of the 8xN array. """ height, width = shape offsets, slices, slicesInv = getOffsets(width) diags = [diag[slicesInv[n]] for n, diag in enumerate (diagonals)] # This block hides false neighbors for the edge pixels (see comments in diagonalizeArray code) # For now we assume that the mask in diagonalizeArray has already been applied, making these # lines redundant and unecessary, but if that changes in the future we can uncomment them #diags[0][np.arange(1,height-1)*width-1] = 0 #diags[2][np.arange(height)*width] = 0 #diags[3][np.arange(1,height)*width-1] = 0 #diags[4][np.arange(1,height)*width-1] = 0 #diags[5][np.arange(height)*width] = 0 #diags[7][np.arange(1,height-1)*width-1] = 0 diagonalArr = sparse.diags(diags, offsets, dtype = dtype) return diagonalArr   def getRadialMonotonicOp(shape, px, py, useNearest = True , minGradient = . 9 ): """Create an operator to constrain radial monotonicity This version of the radial monotonicity operator selects all of the pixels closer to the peak for each pixel and weights their flux based on their alignment with a vector from the pixel to the peak. In order to quickly create this using sparse matrices, its construction is a bit opaque. """ # Calculate the distance between each pixel and the peak size = shape[ 0 ] * shape[ 1 ] x = np.arange(shape[ 1 ]) y = np.arange(shape[ 0 ]) X,Y = np.meshgrid(x,y) X = X - px Y = Y - py distance = np.sqrt(X * * 2 + Y * * 2 ) # Find each pixels neighbors further from the peak and mark them as invalid # (to be removed later) distArr, mask = diagonalizeArray(distance, dtype = np.float64) relativeDist = (distance.flatten()[:, None ] - distArr.T).T invalidPix = relativeDist< = 0 # Calculate the angle between each pixel and the x axis, relative to the peak position # (also avoid dividing by zero and set the tan(infinity) pixel values to pi/2 manually) inf = X = = 0 tX = X.copy() tX[inf] = 1 angles = np.arctan2( - Y, - tX) angles[inf&(Y! = 0 )] = 0.5 * np.pi * np.sign(angles[inf&(Y! = 0 )]) # Calcualte the angle between each pixel and it's neighbors xArr, m = diagonalizeArray(X) yArr, m = diagonalizeArray(Y) dx = (xArr.T - X.flatten()[:, None ]).T dy = (yArr.T - Y.flatten()[:, None ]).T # Avoid dividing by zero and set the tan(infinity) pixel values to pi/2 manually inf = dx = = 0 dx[inf] = 1 relativeAngles = np.arctan2(dy,dx) relativeAngles[inf&(dy! = 0 )] = 0.5 * np.pi * np.sign(relativeAngles[inf&(dy! = 0 )]) # Find the difference between each pixels angle with the peak # and the relative angles to its neighbors, and take the # cos to find its neighbors weight dAngles = (angles.flatten()[:, None ] - relativeAngles.T).T cosWeight = np.cos(dAngles) # Mask edge pixels, array elements outside the operator (for offdiagonal bands with < N elements), # and neighbors further from the peak than the reference pixel cosWeight[invalidPix] = 0 cosWeight[mask] = 0 if useNearest: # Only use a single pixel most in line with peak cosNorm = np.zeros_like(cosWeight) columnIndices = np.arange(cosWeight.shape[ 1 ]) maxIndices = np.argmax(cosWeight, axis = 0 ) indices = maxIndices * cosNorm.shape[ 1 ] + columnIndices indices = np.unravel_index(indices, cosNorm.shape) cosNorm[indices] = minGradient # Remove the reference for the peak pixel cosNorm[:,px + py * shape[ 1 ]] = 0 else : # Normalize the cos weights for each pixel normalize = np. sum (cosWeight, axis = 0 ) normalize[normalize = = 0 ] = 1 cosNorm = (cosWeight.T / normalize[:, None ]).T cosNorm[mask] = 0 cosArr = diagonalsToSparse(cosNorm, shape) # The identity with the peak pixel removed represents the reference pixels diagonal = np.ones(size) diagonal[px + py * shape[ 1 ]] = - 1 monotonic = cosArr - sparse.diags(diagonal) return monotonic.tocoo()
          Hide
          fred3m Fred Moolekamp added a comment -

          The code above has been modified to allow users to force a gradient weight (minGradient=0.9 by default) to attempt to minimize footprint size, and allow the use of a single reference pixel (chosen more correctly and in a more efficient manner than the previous operator this ticket was meant to replace) instead of weighted pixels, which appears to work better. Choosing minGradient=1 only enforces that the flux is radially non-decreasing.

          The biggest issue is that in regions dominated by noise, monotonicity does not enforce steep enough gradients and the footprints grow to be too large, including excess noise in flux measurements. The flux also tends to flatten out in overlap regions when color information in insufficient (i.e. the SED's are too similar), indicating that monotonicity alone will not be enough to solve this problem.

          Show
          fred3m Fred Moolekamp added a comment - The code above has been modified to allow users to force a gradient weight (minGradient=0.9 by default) to attempt to minimize footprint size, and allow the use of a single reference pixel (chosen more correctly and in a more efficient manner than the previous operator this ticket was meant to replace) instead of weighted pixels, which appears to work better. Choosing minGradient=1 only enforces that the flux is radially non-decreasing. The biggest issue is that in regions dominated by noise, monotonicity does not enforce steep enough gradients and the footprints grow to be too large, including excess noise in flux measurements. The flux also tends to flatten out in overlap regions when color information in insufficient (i.e. the SED's are too similar), indicating that monotonicity alone will not be enough to solve this problem.
          Hide
          fred3m Fred Moolekamp added a comment -

          Samples of sources illustrating the flaw with the monotonicity operator that causes an overestimate of flux.

          Show
          fred3m Fred Moolekamp added a comment - Samples of sources illustrating the flaw with the monotonicity operator that causes an overestimate of flux.
          Hide
          fred3m Fred Moolekamp added a comment -

          The above images show the main flaw in using monotonicity: while the above footprints are monotonic, they spread out well beyond the range of the source and exhibit a fractal like structure that causes them to overestimate the flux. Sparsity might help this problem, as well as assist in situations where multiple objects overlap in a blend. However, sparsity alone is not likely to be sufficient to solve this problem, as strongly blended objects in more crowded cluster cores are still likely to fail.

          Show
          fred3m Fred Moolekamp added a comment - The above images show the main flaw in using monotonicity: while the above footprints are monotonic, they spread out well beyond the range of the source and exhibit a fractal like structure that causes them to overestimate the flux. Sparsity might help this problem, as well as assist in situations where multiple objects overlap in a blend. However, sparsity alone is not likely to be sufficient to solve this problem, as strongly blended objects in more crowded cluster cores are still likely to fail.
          Hide
          fred3m Fred Moolekamp added a comment -

          Peter Melchior and I are currently investigating the use of a constraint that might enforce both monotonicity and symmetry. The basic idea is that we actually have a bit more information about the shape of the objects than we are currently exploiting. One advantage of the new deblender is that (by construction) we have information about all of the sources in the blend. In other words, we know which pixels are overlapping between sources at each step as we iteratively work toward a solution. If we treat the overlapping areas as missing data, we might be able to use information that we already have about the radial and azimuthal shape of the object to fill in the gaps, even in cases where an object is blended on both sides.

          For example, if we begin with an ellipse:

          we see that if when look at it's intensity as a function of radius (in pixels) and azimuthal angle (in radians):

          we can clearly pick out the axis of rotation, in this case theta=0.

          Now if we look at two blended ellipses:

          we see that the intensity is no longer periodic with period pi:

          Even using real (simulated) data, we see that with an object that is barely perceptible to be elliptical:

          we can still pick out it's orientation:

          If we can come up with a constraint that forces the images toward our estimation of the blended areas, which will change in each iteration as the footprints themselves change, we might be able to better approximate the flux in overlapping regions of monochromatic sources.

          Show
          fred3m Fred Moolekamp added a comment - Peter Melchior and I are currently investigating the use of a constraint that might enforce both monotonicity and symmetry. The basic idea is that we actually have a bit more information about the shape of the objects than we are currently exploiting. One advantage of the new deblender is that (by construction) we have information about all of the sources in the blend. In other words, we know which pixels are overlapping between sources at each step as we iteratively work toward a solution. If we treat the overlapping areas as missing data, we might be able to use information that we already have about the radial and azimuthal shape of the object to fill in the gaps, even in cases where an object is blended on both sides. For example, if we begin with an ellipse: we see that if when look at it's intensity as a function of radius (in pixels) and azimuthal angle (in radians): we can clearly pick out the axis of rotation, in this case theta=0. Now if we look at two blended ellipses: we see that the intensity is no longer periodic with period pi: Even using real (simulated) data, we see that with an object that is barely perceptible to be elliptical: we can still pick out it's orientation: If we can come up with a constraint that forces the images toward our estimation of the blended areas, which will change in each iteration as the footprints themselves change, we might be able to better approximate the flux in overlapping regions of monochromatic sources.
          Hide
          pmelchior Peter Melchior added a comment -

          While a constraint in the polar coordinate frame should allow us to impose constraints on symmetry AND monotonicity in one penalty function, it's not a straightforward operation. The shape of the penalty function would have to encode properties of real galaxies, while penalizing features that appear to originate from blends. Definitely possible, but not trivial. We need to think about it more.

          The problem can also be solved by adding a L0 sparsity penalty, which we can already do in the current algorithm. Even a small amount of L0 penalty would get rid of those extended streaks. The problem with this penalty arises in the overlap region of to legitimate objects, where the strength of the penalty will decide how clean the cut between the objects will be. So, there's the need to optimize a parameters, but we have a validation metric (the difference between the correlation coefficient of input galaxies to the correlation coefficient of the deblender outputs) that can tell us if the deblender is too aggressive or too lenient.

          Show
          pmelchior Peter Melchior added a comment - While a constraint in the polar coordinate frame should allow us to impose constraints on symmetry AND monotonicity in one penalty function, it's not a straightforward operation. The shape of the penalty function would have to encode properties of real galaxies, while penalizing features that appear to originate from blends. Definitely possible, but not trivial. We need to think about it more. The problem can also be solved by adding a L0 sparsity penalty, which we can already do in the current algorithm. Even a small amount of L0 penalty would get rid of those extended streaks. The problem with this penalty arises in the overlap region of to legitimate objects, where the strength of the penalty will decide how clean the cut between the objects will be. So, there's the need to optimize a parameters, but we have a validation metric (the difference between the correlation coefficient of input galaxies to the correlation coefficient of the deblender outputs) that can tell us if the deblender is too aggressive or too lenient.
          Hide
          fred3m Fred Moolekamp added a comment -

          Since this ticket was really meant to cover the creation of the monotonicity operator itself, I think it can be marked as finished. Peter Melchior would you mind reviewing this (I fixed a minor bug the other day that I don't think you have in your latest code, so make sure to copy getRadialMonotonicOp, which I just edited this morning).

          Show
          fred3m Fred Moolekamp added a comment - Since this ticket was really meant to cover the creation of the monotonicity operator itself, I think it can be marked as finished. Peter Melchior would you mind reviewing this (I fixed a minor bug the other day that I don't think you have in your latest code, so make sure to copy getRadialMonotonicOp , which I just edited this morning).
          Hide
          pmelchior Peter Melchior added a comment -

          This solves the drastic streaks the old operator had. The operator now works as expected, limitations of monotonicity (like the low-level streaks) that are still present need to be solved at a higher level.

          Show
          pmelchior Peter Melchior added a comment - This solves the drastic streaks the old operator had. The operator now works as expected, limitations of monotonicity (like the low-level streaks) that are still present need to be solved at a higher level.

            People

            • Assignee:
              fred3m Fred Moolekamp
              Reporter:
              fred3m Fred Moolekamp
              Reviewers:
              Peter Melchior
              Watchers:
              Fred Moolekamp, Peter Melchior
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Summary Panel