Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-12184

Coaddition Tasks cannot assume that N masks can fit in memory

    Details

    • Story Points:
      8
    • Sprint:
      DRP S18-2
    • Team:
      Data Release Production

      Description

      Perry Gee and Tony Johnson have recently reported high memory usage by SafeClipAssembleCoadd for examples with either large N (number of warps) or large patch sizes.
      For simulated data processed by DESC, Tony Johnson reported 70GB processes coadding 6kX6k patches and unknown N. (Fill this in if you remember)
      For DES data, Perry Gee reported memory usage >32GB with 10kX10k patches and unknown N. (Note: patches this large are not recommended). Masks were still uint16 at the time.

      Profiling reveals that this memory usage was likely due to holding N masks in memory. In this example, patches were 4200X4200, mask pixels = int32, and N=33. 4200*4200*4*33/2**20 = 2220MiB. Full profiling print out attached.

      Line #    Mem usage    Increment   Line Contents
      ================================================
        1294    556.6 MiB      0.0 MiB       @profile
        1295                                 def detectClip(self, exp, tempExpRefList):
        1296                                     """!
        1297                                     \brief Detect clipped regions on an exposure and set the mask on the individual tempExp masks
       
      ...                         
       
        1326    586.2 MiB      0.0 MiB           clipFootprints = []
        1327    586.2 MiB      0.0 MiB           clipIndices = []
        1328                             
        1329                                     # build a list with a mask for each visit which can be modified with clipping information
        1330    586.2 MiB      0.0 MiB           tempExpClipList = [tmpExpRef.get(self.getTempExpDatasetName(self.warpType),
        1331                                                                      immediate=True).getMaskedImage().getMask() for
        1332   2806.8 MiB   2220.7 MiB                              tmpExpRef in tempExpRefList]
        1333                             
        1334   2813.1 MiB      6.3 MiB           for footprint in footprints.getFootprints():
        1335   2813.1 MiB      0.0 MiB               nPixel = footprint.getArea()
      

      For the simulated data example, N=~1000, patch size = 6KX6K, mask pixel size = uint16 (May 2016) , would have yielded a theoretical usage of 6e3*6e3*2*1000/2**30 = 67GiB. For the DES, example with the 10kX10k patches any more than 150 visits would have not fit on the 32G system.

      Now that we've switched to int32 masks this is even more important, as the memory usage doubles for examples with large N or large patches. Even with 4k X 4k patches, it only takes N=180 to get memory usage > 10GB.

      This ticket will remove the assumption that N masks can fit in memory from the implementation of CompareWarpAssembleCoadd. Because the solution will likely be the same for SafeClipAssemble too it will likely be implemented at the same time.

        Attachments

          Issue Links

            Activity

            Hide
            yusra Yusra AlSayyad added a comment - - edited

            Bob Armstrong Do you have time to review?

            I tried to maintain as much of the existing SafeClip as possible.  It  now reads in the masks from disk one more time than it used to (once in detectClip and once in detectClipBig) If you see a way to further refactor to remove this extra read (maybe by merging detectClip and detectClipBig into one method), please note.

             

            https://github.com/lsst/pipe_tasks/pull/177/ 

            Show
            yusra Yusra AlSayyad added a comment - - edited Bob Armstrong Do you have time to review? I tried to maintain as much of the existing SafeClip as possible.  It  now reads in the masks from disk one more time than it used to (once in detectClip and once in detectClipBig) If you see a way to further refactor to remove this extra read (maybe by merging detectClip and detectClipBig into one method), please note.   https://github.com/lsst/pipe_tasks/pull/177/  
            Hide
            rearmstr Bob Armstrong added a comment -

            I don't think there is a simple way to refactor it, so I would leave it as is, unless there is a need for it.

            Show
            rearmstr Bob Armstrong added a comment - I don't think there is a simple way to refactor it, so I would leave it as is, unless there is a need for it.
            Hide
            yusra Yusra AlSayyad added a comment - - edited

            OK, Bob and I poked at it to see if we could remove the second read.  Success on that front. 

            This isn't the cleanest refactor of SafeClip possible.  `detectClipBig` now modifies the result of `detectClip` which tells me that they should probably be merged into one method, or `detectClipBig` should be renamed to `appendClipBig`. 

            Jenkins running now. 

            https://ci.lsst.codes/job/stack-os-matrix/27370/

            https://ci.lsst.codes/job/stack-os-matrix/27372/

            Show
            yusra Yusra AlSayyad added a comment - - edited OK, Bob and I poked at it to see if we could remove the second read.  Success on that front.  This isn't the cleanest refactor of SafeClip possible.  `detectClipBig` now modifies the result of `detectClip` which tells me that they should probably be merged into one method, or `detectClipBig` should be renamed to `appendClipBig`.  Jenkins running now.  https://ci.lsst.codes/job/stack-os-matrix/27370/ https://ci.lsst.codes/job/stack-os-matrix/27372/
            Hide
            rearmstr Bob Armstrong added a comment -

            I don't think the modification is a problem. It was always effectively modifying the results of detectClip. You could certainly merge the two methods, but I'm hoping we can move away from this code soon.

            Show
            rearmstr Bob Armstrong added a comment - I don't think the modification is a problem. It was always effectively modifying the results of detectClip. You could certainly merge the two methods, but I'm hoping we can move away from this code soon.

              People

              • Assignee:
                yusra Yusra AlSayyad
                Reporter:
                yusra Yusra AlSayyad
                Reviewers:
                Bob Armstrong
                Watchers:
                Bob Armstrong, Nate Lust, Paul Price, Perry Gee, Simon Krughoff, Tony Johnson, Yusra AlSayyad
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel