Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-7040

Stars selected by starSelector change when number of cores varies

    Details

    • Templates:
    • Story Points:
      2
    • Epic Link:
    • Sprint:
      DRP F16-3
    • Team:
      Data Release Production

      Description

      Sogo Mineo writes in HSC-1414:

      See the following lines:

      meas_algorithms/HSC-4.0.0/python/lsst/meas/algorithms/objectSizeStarSelector.py:466
      in ObjectSizeStarSelector.selectStars():

          if psfCandidate.getWidth() == 0:
              psfCandidate.setBorderWidth(self._borderWidth)
              psfCandidate.setWidth(self._kernelSize + 2*self._borderWidth)
              psfCandidate.setHeight(self._kernelSize + 2*self._borderWidth)
      

      In reduceFrames, these lines set the width of psfCandidate to be 21
      for the first time the execution reaches there.

      When the first CCD image has been processed, the worker process
      continues to process another CCD image, and the execution reaches
      here again. This time, psfCandidate.getWidth() is 41, because
      psfexPsfDeterminer has set it to be 41, and the value has been
      retained because the width is a static member. And so, for the second
      CCD image, the width of psfCandidate is not 21 but 41.

      Since psfCandidates are widened, stars positioned at edges of images
      are rejected. It results in a smaller number of PSF candidates than expected.

      Only CCD images that are initially given to the worker processes
      are processed with psfCandidate.getWidth() == 21. The other CCD images are
      processed with psfCandidate.getWidth() == 41.
      When the number of SMP cores changes, CCD images are processed with different
      parameters.

      The change in the number of PSF candidates results in different Psf, a different
      result of image repair, and different catalogs.

      The line numbers are different on the LSST side because of refactoring (objectSizeStarSelector.py:466 has moved to starSelector.py:148), but the bug is still present. The main problem appears to be that the PsfCandidate elements are static, are being set in both the star selector and the PSF determiner and one of those is conditional on what the value is. I will investigate moving the static class variables to instance variables — the desired size appears to vary by context, so it shouldn't be a class variable.

        Issue Links

          Activity

          Hide
          price Paul Price added a comment -

          I think this is the tip of the iceberg of design issues in the PsfCandidate class.

          • _border and _var appear to be completely unused, either in PsfCandidate or elsewhere.
          • _amplitude is unused in in PsfCandidate, and only used by peripheral operations.
          • _defaultWidth isn't really a class variable because there's no way to set it — it should actually be a constant. It's only purpose is to guard against not having set the width, which could be dealt with by throwing an exception.
          • _xyCenter isn't necessary, as grand-super-class SpatialCellCandidate has x,y.

          I don't think I should be cleaning all that up as part of this ticket, but it should be noted as an area needing cleanup.

          I think I can solve this problem by just removing the conditional setting of the width.

          Show
          price Paul Price added a comment - I think this is the tip of the iceberg of design issues in the PsfCandidate class. _border and _var appear to be completely unused, either in PsfCandidate or elsewhere. _amplitude is unused in in PsfCandidate , and only used by peripheral operations. _defaultWidth isn't really a class variable because there's no way to set it — it should actually be a constant. It's only purpose is to guard against not having set the width, which could be dealt with by throwing an exception. _xyCenter isn't necessary, as grand-super-class SpatialCellCandidate has x,y. I don't think I should be cleaning all that up as part of this ticket, but it should be noted as an area needing cleanup. I think I can solve this problem by just removing the conditional setting of the width.
          Hide
          price Paul Price added a comment -

          Perry Gee, would you mind reviewing this? It's pretty small, consisting in only 4 lines of changes. The patch is below.

          pprice@tiger-sumire:~/LSST/meas/algorithms (tickets/DM-7040=) $ git sub-patch
          commit 7676c50591b2c7bc31574962c5a4991aadf62711
          Author: Paul Price <price@astro.princeton.edu>
          Date:   Thu Jul 28 16:25:35 2016 -0400
           
              starSelector: remove conditional setting of default width
              
              The width was only being set if the width was currently zero.
              This produced the following pattern:
              * Processing image 1:
                * star selector: width is zero --> set to 21
                * psf determiner: set width to 41
              * Processing image 2:
                * star selector: width is 41 --> leave
              ...
              
              The lack of consistency between the width used in the first star
              selector and the width used in subsequent star selector calls can
              affect the choice of PSF candidates, and makes the pipeline results
              subject to when it's run and in what manner.
              
              To fix this, we always set the default width in the star selector.
           
          diff --git a/python/lsst/meas/algorithms/starSelector.py b/python/lsst/meas/algorithms/starSelector.py
          index b88d2d7..1f01f15 100755
          --- a/python/lsst/meas/algorithms/starSelector.py
          +++ b/python/lsst/meas/algorithms/starSelector.py
          @@ -138,6 +138,7 @@ class BaseStarSelectorTask(pipeBase.Task):
                   goodStarCat = SourceCatalog(starCat.schema)
           
                   psfCandidateList = []
          +        didSetSize = False
                   for star in starCat:
                       try:
                           psfCandidate = algorithmsLib.makePsfCandidate(star, exposure)
          @@ -145,10 +146,11 @@ class BaseStarSelectorTask(pipeBase.Task):
                           # The setXXX methods are class static, but it's convenient to call them on
                           # an instance as we don't know Exposure's pixel type
                           # (and hence psfCandidate's exact type)
          -                if psfCandidate.getWidth() == 0:
          +                if not didSetSize:
                               psfCandidate.setBorderWidth(self.config.borderWidth)
                               psfCandidate.setWidth(self.config.kernelSize + 2*self.config.borderWidth)
                               psfCandidate.setHeight(self.config.kernelSize + 2*self.config.borderWidth)
          +                    didSetSize = True
           
                           im = psfCandidate.getMaskedImage().getImage()
                       except Exception as err:
          

          Show
          price Paul Price added a comment - Perry Gee , would you mind reviewing this? It's pretty small, consisting in only 4 lines of changes. The patch is below. pprice@tiger-sumire:~/LSST/meas/algorithms (tickets/DM-7040=) $ git sub-patch commit 7676c50591b2c7bc31574962c5a4991aadf62711 Author: Paul Price <price@astro.princeton.edu> Date: Thu Jul 28 16:25:35 2016 -0400   starSelector: remove conditional setting of default width The width was only being set if the width was currently zero. This produced the following pattern: * Processing image 1: * star selector: width is zero --> set to 21 * psf determiner: set width to 41 * Processing image 2: * star selector: width is 41 --> leave ... The lack of consistency between the width used in the first star selector and the width used in subsequent star selector calls can affect the choice of PSF candidates, and makes the pipeline results subject to when it's run and in what manner. To fix this, we always set the default width in the star selector.   diff --git a/python/lsst/meas/algorithms/starSelector.py b/python/lsst/meas/algorithms/starSelector.py index b88d2d7..1f01f15 100755 --- a/python/lsst/meas/algorithms/starSelector.py +++ b/python/lsst/meas/algorithms/starSelector.py @@ -138,6 +138,7 @@ class BaseStarSelectorTask(pipeBase.Task): goodStarCat = SourceCatalog(starCat.schema) psfCandidateList = [] + didSetSize = False for star in starCat: try: psfCandidate = algorithmsLib.makePsfCandidate(star, exposure) @@ -145,10 +146,11 @@ class BaseStarSelectorTask(pipeBase.Task): # The setXXX methods are class static, but it's convenient to call them on # an instance as we don't know Exposure's pixel type # (and hence psfCandidate's exact type) - if psfCandidate.getWidth() == 0: + if not didSetSize: psfCandidate.setBorderWidth(self.config.borderWidth) psfCandidate.setWidth(self.config.kernelSize + 2*self.config.borderWidth) psfCandidate.setHeight(self.config.kernelSize + 2*self.config.borderWidth) + didSetSize = True im = psfCandidate.getMaskedImage().getImage() except Exception as err:
          Hide
          pgee Perry Gee added a comment -

          The code is frightening, with statics in SpatialCellMaskedImageCandidate and PsfCandidate which potentially can be differently by different callers. Other than the class member _image, the PsfCandidate cannot know later on how the dimensions were set when it was created. This seems very fragile.

          This code change does insure that each psfCandidate in the list is has an _image with the same dimensions as was specified in the config. This is true as long as it is not interrupted in its star look by another piece of code with access to the same statics.

          Is there any possibility that while processing this starCat, the code could be interrupted by code which might change _width or _height? Same for _border in PsfCandidate. Not saying one way or the other – just wondering.

          I approve the fix if this is not an issue.

          Show
          pgee Perry Gee added a comment - The code is frightening, with statics in SpatialCellMaskedImageCandidate and PsfCandidate which potentially can be differently by different callers. Other than the class member _image, the PsfCandidate cannot know later on how the dimensions were set when it was created. This seems very fragile. This code change does insure that each psfCandidate in the list is has an _image with the same dimensions as was specified in the config. This is true as long as it is not interrupted in its star look by another piece of code with access to the same statics. Is there any possibility that while processing this starCat, the code could be interrupted by code which might change _width or _height? Same for _border in PsfCandidate. Not saying one way or the other – just wondering. I approve the fix if this is not an issue.
          Hide
          price Paul Price added a comment -

          PsfCandidate is certainly fragile due to the use of static, and I would love to see it redesigned (I think the static members should be moved into a separate builder class, all the way down the hierarchy), but that would probably take a few days and so it needs to be scheduled (Jim Bosch and John Swinbank: can you put PsfCandidate and ancestors on the list of redesigns?). We've gotten away with the current design for a while, so a redesign is probably not urgent.

          Unfortunately, the use of static makes code using PsfCandidate not strictly re-entrant, and I don't think there's much we can do about that without a redesign. However, most python codes (including ours) don't use threads (because of the GIL), but use processes instead. With processes, we don't have to worry about the lack of re-entrancy. All we have to do is make sure that we set the parameters we want before operating on the set, and that's the purpose of this change.

          I will merge later today if there are no objections.

          Show
          price Paul Price added a comment - PsfCandidate is certainly fragile due to the use of static , and I would love to see it redesigned (I think the static members should be moved into a separate builder class, all the way down the hierarchy), but that would probably take a few days and so it needs to be scheduled ( Jim Bosch and John Swinbank : can you put PsfCandidate and ancestors on the list of redesigns?). We've gotten away with the current design for a while, so a redesign is probably not urgent. Unfortunately, the use of static makes code using PsfCandidate not strictly re-entrant, and I don't think there's much we can do about that without a redesign. However, most python codes (including ours) don't use threads (because of the GIL), but use processes instead. With processes, we don't have to worry about the lack of re-entrancy. All we have to do is make sure that we set the parameters we want before operating on the set, and that's the purpose of this change. I will merge later today if there are no objections.
          Hide
          pgee Perry Gee added a comment -

          Yes, the code is certainly not thread-safe, which is why I kicked it back to you. Wasn't sure if that was an issue or not.

          I assume that you looked at the code enough to assure yourself that once the image is created, the psfCandidate won't ever call getWidth or getHeight again? I notice that if candidate.getMaskedImage() is called later on, and the underlying _width or _height has been altered, it looks like getMaskedImage(width, height) will end up making a new image with the new class static _width and _height.

          Show
          pgee Perry Gee added a comment - Yes, the code is certainly not thread-safe, which is why I kicked it back to you. Wasn't sure if that was an issue or not. I assume that you looked at the code enough to assure yourself that once the image is created, the psfCandidate won't ever call getWidth or getHeight again? I notice that if candidate.getMaskedImage() is called later on, and the underlying _width or _height has been altered, it looks like getMaskedImage(width, height) will end up making a new image with the new class static _width and _height.
          Hide
          price Paul Price added a comment -

          You're exactly right about how this works. The PsfexPsfDeterminer will later change the width and height; that's fine, so long as we set the width and height back; that's what this ticket does.

          Show
          price Paul Price added a comment - You're exactly right about how this works. The PsfexPsfDeterminer will later change the width and height; that's fine, so long as we set the width and height back; that's what this ticket does.
          Hide
          price Paul Price added a comment -

          Merged to master.

          Show
          price Paul Price added a comment - Merged to master.
          Hide
          pgee Perry Gee added a comment -

          That isn't what I meant.

          Even if the width and height are set consistently based on the config values, the PsfCandidate.getMaskedImage() call will return an image based on the current underlying static _width and _height, possibly not the same as when the original PsfCandidate and _image were created in the code this ticket modifies.

          That seems like unpredictable behavior to me, but I trust that you know this code better than I do.

          Show
          pgee Perry Gee added a comment - That isn't what I meant. Even if the width and height are set consistently based on the config values, the PsfCandidate.getMaskedImage() call will return an image based on the current underlying static _width and _height, possibly not the same as when the original PsfCandidate and _image were created in the code this ticket modifies. That seems like unpredictable behavior to me, but I trust that you know this code better than I do.
          Hide
          price Paul Price added a comment -

          Unfortunately that's how it works at the moment. However, all uses of the PsfCandidate are prefixed with code to set the desired width and height for that use, so I believe everything is about the best it can be with this fragile design.

          Show
          price Paul Price added a comment - Unfortunately that's how it works at the moment. However, all uses of the PsfCandidate are prefixed with code to set the desired width and height for that use, so I believe everything is about the best it can be with this fragile design.

            People

            • Assignee:
              price Paul Price
              Reporter:
              price Paul Price
              Reviewers:
              Perry Gee
              Watchers:
              Jim Bosch, Paul Price, Perry Gee
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development

                  Agile