Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-35008

trailedSources tests very slow on macOS ARM

    XMLWordPrintable

    Details

    • Team:
      Architecture
    • Urgent?:
      No

      Description

      DM-33325 added some monte carlo tests to meas_extensions_trailedSources and they are a disaster for macOS ARM CPUs.

      On an old intel mac mini:

       ============================= slowest 5 durations ==============================
       32.91s call     tests/test_trailedSources.py::TrailedSourcesTestCase_18.823189933596083_5.042446276240214_52.688133441854_0.5272863678675188::testMonteCarlo
       32.42s call     tests/test_trailedSources.py::TrailedSourcesTestCase_18.824525959248994_3.690341720399934_12.706783496622066_80.0121374592317::testMonteCarlo
       30.48s call     tests/test_trailedSources.py::TrailedSourcesTestCase_18.778338539085038_3.712755610201576_77.74789748127772_8.918117596264874::testMonteCarlo
       29.14s call     tests/test_trailedSources.py::TrailedSourcesTestCase_19.632089387567788_4.363012829054862_13.575047038897603_39.65613368621441::testMonteCarlo
       28.94s call     tests/test_trailedSources.py::TrailedSourcesTestCase_18.978084668646876_0.541677473106708_34.88128456167191_52.19822420855503::testMonteCarlo
      ] ================= 161 passed, 21 warnings in 144.23s (0:02:24) =================
      

      but on a M1 Pro mac:

       ============================= slowest 5 durations ==============================
       213.96s call     tests/test_trailedSources.py::TrailedSourcesTestCase_18.823189933596083_5.042446276240214_52.688133441854_0.5272863678675188::testMonteCarlo
       209.18s call     tests/test_trailedSources.py::TrailedSourcesTestCase_18.824525959248994_3.690341720399934_12.706783496622066_80.0121374592317::testMonteCarlo
       185.70s call     tests/test_trailedSources.py::TrailedSourcesTestCase_18.778338539085038_3.712755610201576_77.74789748127772_8.918117596264874::testMonteCarlo
       176.01s call     tests/test_trailedSources.py::TrailedSourcesTestCase_15.552399069150322_1.2533910548710816_24.737438752106133_87.04870569671161::testMonteCarlo
       166.66s call     tests/test_trailedSources.py::TrailedSourcesTestCase_19.701132323
       ================= 161 passed, 21 warnings in 652.78s (0:10:52) =================
      

      It turns out that all the time is spent calling getMaskPlanes in the meas_base noise replacer:

      try:
          # does it already exist?
          plane = mask.getMaskPlane(maskname)
          if self.log:
              self.log.debug('Mask plane "%s" already existed', maskname)
      except Exception:
          # if not, add it; we should delete it when done.
          plane = mask.addMaskPlane(maskname)
          self.removeplanes.append(maskname)
      

      Indicating that this is another example of DM-33478.

      Replacing the try block with

      if maskname in mask.getMaskPlaneDict():
      

      fixes the problem since it avoids the C++ layer raising an exception.

        Attachments

          Issue Links

            Activity

            Hide
            erykoff Eli Rykoff added a comment -

            Yes please!

            Show
            erykoff Eli Rykoff added a comment - Yes please!
            Hide
            tjenness Tim Jenness added a comment -

            Jim Bosch we've decided to just do the main fix on this ticket. Jenkins is running. The change is trivial.

            Show
            tjenness Tim Jenness added a comment - Jim Bosch we've decided to just do the main fix on this ticket. Jenkins is running. The change is trivial.
            Hide
            tjenness Tim Jenness added a comment -

            Thanks for the quick review. I merged what we had so the next weekly won't kill Eli Rykoff's laptop.

            Show
            tjenness Tim Jenness added a comment - Thanks for the quick review. I merged what we had so the next weekly won't kill Eli Rykoff 's laptop.
            Hide
            tjenness Tim Jenness added a comment -

            ARM is still slower than the old Mac Mini. Running a line_profiler on the realize method (the measure method is C++) I get:

            Total time: 155.226 s
            File: /Users/timj/work/lsstsw/build/meas_base/python/lsst/meas/base/tests.py
            Function: realize at line 518
             
            Line #      Hits         Time  Per Hit   % Time  Line Contents
            ==============================================================
               518                                               @profile
               519                                               def realize(self, noise, schema, randomSeed=1):
               520                                                   r"""Simulate an exposure and detection catalog for this dataset.
               521                                           
               522                                                   The simulation includes noise, and the detection catalog includes
               523                                                   `~lsst.afw.detection.heavyFootprint.HeavyFootprint`\ s.
               524                                           
               525                                                   Parameters
               526                                                   ----------
               527                                                   noise : `float`
               528                                                       Standard deviation of noise to be added to the exposure.  The
               529                                                       noise will be Gaussian and constant, appropriate for the
               530                                                       sky-limited regime.
               531                                                   schema : `lsst.afw.table.Schema`
               532                                                       Schema of the new catalog to be created.  Must start with
               533                                                       ``self.schema`` (i.e. ``schema.contains(self.schema)`` must be
               534                                                       `True`), but typically contains fields for already-configured
               535                                                       measurement algorithms as well.
               536                                                   randomSeed : `int`, optional
               537                                                       Seed for the random number generator.
               538                                                       If `None`, a seed is chosen automatically.
               539                                           
               540                                                   Returns
               541                                                   -------
               542                                                   `exposure` : `lsst.afw.image.ExposureF`
               543                                                       Simulated image.
               544                                                   `catalog` : `lsst.afw.table.SourceCatalog`
               545                                                       Simulated detection catalog.
               546                                                   """
               547    127205   13221632.0    103.9      8.5          random_state = np.random.RandomState(randomSeed)
               548    127205     230427.0      1.8      0.1          assert schema.contains(self.schema)
               549    127205     301874.0      2.4      0.2          mapper = lsst.afw.table.SchemaMapper(self.schema)
               550    127205    1154319.0      9.1      0.7          mapper.addMinimalSchema(self.schema, True)
               551    127205    2574582.0     20.2      1.7          exposure = self.exposure.clone()
               552    127205     950954.0      7.5      0.6          exposure.getMaskedImage().getVariance().getArray()[:, :] = noise**2
               553    254410    2588007.0     10.2      1.7          exposure.getMaskedImage().getImage().getArray()[:, :] \
               554    127205   44130268.0    346.9     28.4              += random_state.randn(exposure.getHeight(), exposure.getWidth())*noise
               555    127205   52919218.0    416.0     34.1          catalog = lsst.afw.table.SourceCatalog(schema)
               556    127205     796583.0      6.3      0.5          catalog.extend(self.catalog, mapper=mapper)
               557                                                   # Loop over sources and generate new HeavyFootprints that divide up
               558                                                   # the noisy pixels, not the ideal no-noise pixels.
               559    254410   36064665.0    141.8     23.2          for record in catalog:
               560                                                       # parent objects have non-Heavy Footprints, which don't need to be
               561                                                       # updated after adding noise.
               562    127205     147800.0      1.2      0.1              if record.getParent() == 0:
               563    127205      67545.0      0.5      0.0                  continue
               564                                                       # get flattened arrays that correspond to the no-noise and noisy
               565                                                       # parent images
               566                                                       parent = catalog.find(record.getParent())
               567                                                       footprint = parent.getFootprint()
               568                                                       parentFluxArrayNoNoise = np.zeros(footprint.getArea(), dtype=np.float32)
               569                                                       footprint.spans.flatten(parentFluxArrayNoNoise,
               570                                                                               self.exposure.getMaskedImage().getImage().getArray(),
               571                                                                               self.exposure.getXY0())
               572                                                       parentFluxArrayNoisy = np.zeros(footprint.getArea(), dtype=np.float32)
               573                                                       footprint.spans.flatten(parentFluxArrayNoisy,
               574                                                                               exposure.getMaskedImage().getImage().getArray(),
               575                                                                               exposure.getXY0())
               576                                                       oldHeavy = record.getFootprint()
               577                                                       fraction = (oldHeavy.getImageArray() / parentFluxArrayNoNoise)
               578                                                       # N.B. this isn't a copy ctor - it's a copy from a vanilla
               579                                                       # Footprint, so it doesn't copy the arrays we don't want to
               580                                                       # change, and hence we have to do that ourselves below.
               581                                                       newHeavy = lsst.afw.detection.HeavyFootprintF(oldHeavy)
               582                                                       newHeavy.getImageArray()[:] = parentFluxArrayNoisy*fraction
               583                                                       newHeavy.getMaskArray()[:] = oldHeavy.getMaskArray()
               584                                                       newHeavy.getVarianceArray()[:] = oldHeavy.getVarianceArray()
               585                                                       record.setFootprint(newHeavy)
               586    127205      77742.0      0.6      0.1          return exposure, catalog
            

            The 23% in line 559 seems to be the thing that triggers the __getitem__ calls – SourceCatalog does not have an __iter__ method so python is calling getitem repeatedly. Jim Bosch is there a reason that SourceCatalog doesn't have an optimized iterator that doesn't have to go through the same fancy logic as getitem?

            Show
            tjenness Tim Jenness added a comment - ARM is still slower than the old Mac Mini. Running a line_profiler on the realize method (the measure method is C++) I get: Total time: 155.226 s File: /Users/timj/work/lsstsw/build/meas_base/python/lsst/meas/base/tests.py Function: realize at line 518   Line # Hits Time Per Hit % Time Line Contents ============================================================== 518 @profile 519 def realize(self, noise, schema, randomSeed=1): 520 r"""Simulate an exposure and detection catalog for this dataset. 521 522 The simulation includes noise, and the detection catalog includes 523 `~lsst.afw.detection.heavyFootprint.HeavyFootprint`\ s. 524 525 Parameters 526 ---------- 527 noise : `float` 528 Standard deviation of noise to be added to the exposure. The 529 noise will be Gaussian and constant, appropriate for the 530 sky-limited regime. 531 schema : `lsst.afw.table.Schema` 532 Schema of the new catalog to be created. Must start with 533 ``self.schema`` (i.e. ``schema.contains(self.schema)`` must be 534 `True`), but typically contains fields for already-configured 535 measurement algorithms as well. 536 randomSeed : `int`, optional 537 Seed for the random number generator. 538 If `None`, a seed is chosen automatically. 539 540 Returns 541 ------- 542 `exposure` : `lsst.afw.image.ExposureF` 543 Simulated image. 544 `catalog` : `lsst.afw.table.SourceCatalog` 545 Simulated detection catalog. 546 """ 547 127205 13221632.0 103.9 8.5 random_state = np.random.RandomState(randomSeed) 548 127205 230427.0 1.8 0.1 assert schema.contains(self.schema) 549 127205 301874.0 2.4 0.2 mapper = lsst.afw.table.SchemaMapper(self.schema) 550 127205 1154319.0 9.1 0.7 mapper.addMinimalSchema(self.schema, True) 551 127205 2574582.0 20.2 1.7 exposure = self.exposure.clone() 552 127205 950954.0 7.5 0.6 exposure.getMaskedImage().getVariance().getArray()[:, :] = noise**2 553 254410 2588007.0 10.2 1.7 exposure.getMaskedImage().getImage().getArray()[:, :] \ 554 127205 44130268.0 346.9 28.4 += random_state.randn(exposure.getHeight(), exposure.getWidth())*noise 555 127205 52919218.0 416.0 34.1 catalog = lsst.afw.table.SourceCatalog(schema) 556 127205 796583.0 6.3 0.5 catalog.extend(self.catalog, mapper=mapper) 557 # Loop over sources and generate new HeavyFootprints that divide up 558 # the noisy pixels, not the ideal no-noise pixels. 559 254410 36064665.0 141.8 23.2 for record in catalog: 560 # parent objects have non-Heavy Footprints, which don't need to be 561 # updated after adding noise. 562 127205 147800.0 1.2 0.1 if record.getParent() == 0: 563 127205 67545.0 0.5 0.0 continue 564 # get flattened arrays that correspond to the no-noise and noisy 565 # parent images 566 parent = catalog.find(record.getParent()) 567 footprint = parent.getFootprint() 568 parentFluxArrayNoNoise = np.zeros(footprint.getArea(), dtype=np.float32) 569 footprint.spans.flatten(parentFluxArrayNoNoise, 570 self.exposure.getMaskedImage().getImage().getArray(), 571 self.exposure.getXY0()) 572 parentFluxArrayNoisy = np.zeros(footprint.getArea(), dtype=np.float32) 573 footprint.spans.flatten(parentFluxArrayNoisy, 574 exposure.getMaskedImage().getImage().getArray(), 575 exposure.getXY0()) 576 oldHeavy = record.getFootprint() 577 fraction = (oldHeavy.getImageArray() / parentFluxArrayNoNoise) 578 # N.B. this isn't a copy ctor - it's a copy from a vanilla 579 # Footprint, so it doesn't copy the arrays we don't want to 580 # change, and hence we have to do that ourselves below. 581 newHeavy = lsst.afw.detection.HeavyFootprintF(oldHeavy) 582 newHeavy.getImageArray()[:] = parentFluxArrayNoisy*fraction 583 newHeavy.getMaskArray()[:] = oldHeavy.getMaskArray() 584 newHeavy.getVarianceArray()[:] = oldHeavy.getVarianceArray() 585 record.setFootprint(newHeavy) 586 127205 77742.0 0.6 0.1 return exposure, catalog The 23% in line 559 seems to be the thing that triggers the __getitem__ calls – SourceCatalog does not have an __iter__ method so python is calling getitem repeatedly. Jim Bosch is there a reason that SourceCatalog doesn't have an optimized iterator that doesn't have to go through the same fancy logic as getitem?
            Hide
            tjenness Tim Jenness added a comment - - edited

            Although looking at __getitem__ it seems that 95% of those calls are spent in _getitem_ in the pybind11 layer which itself is expecting to have to handle strings or integers.

            Total time: 203.359 s
            File: /Users/timj/work/lsstsw/build/afw/python/lsst/afw/table/_base.py
            Function: __getitem__ at line 85
             
            Line #      Hits         Time  Per Hit   % Time  Line Contents
            ==============================================================
                85                                               @profile
                86                                               def __getitem__(self, key):
                87                                                   """Return the record at index key if key is an integer,
                88                                                   return a column if `key` is a string field name or Key,
                89                                                   or return a subset of the catalog if key is a slice
                90                                                   or boolean NumPy array.
                91                                                   """
                92   1908275     803625.0      0.4      0.4          if type(key) is slice:
                93    381615     211729.0      0.6      0.1              (start, stop, step) = (key.start, key.stop, key.step)
                94    381615     133328.0      0.3      0.1              if step is None:
                95    127205      42708.0      0.3      0.0                  step = 1
                96    381615     121868.0      0.3      0.1              if start is None:
                97                                                           start = 0
                98    381615     118859.0      0.3      0.1              if stop is None:
                99                                                           stop = len(self)
               100    381615     583426.0      1.5      0.3              return self.subset(start, stop, step)
               101   1526660     669792.0      0.4      0.3          elif isinstance(key, np.ndarray):
               102                                                       if key.dtype == bool:
               103                                                           return self.subset(key)
               104                                                       raise RuntimeError(f"Unsupported array type for indexing non-contiguous Catalog: {key.dtype}")
               105   1526660    9493937.0      6.2      4.7          elif isinstance(key, Key) or isinstance(key, str):
               106       200        922.0      4.6      0.0              if not self.isContiguous():
               107                                                           if isinstance(key, str):
               108                                                               key = self.schema[key].asKey()
               109                                                           array = self._getitem_(key)
               110                                                           # This array doesn't share memory with the Catalog, so don't let it be modified by
               111                                                           # the user who thinks that the Catalog itself is being modified.
               112                                                           # Just be aware that this array can only be passed down to C++ as an ndarray::Array<T const>
               113                                                           # instead of an ordinary ndarray::Array<T>. If pybind isn't letting it down into C++,
               114                                                           # you may have left off the 'const' in the definition.
               115                                                           array.flags.writeable = False
               116                                                           return array
               117       200       3316.0     16.6      0.0              return self.columns[key]
               118                                                   else:
               119   1526460  191175448.0    125.2     94.0              return self._getitem_(key)
            

            So a specialist iterator coming out of the pybind11 might be the best answer.

            Show
            tjenness Tim Jenness added a comment - - edited Although looking at __getitem__ it seems that 95% of those calls are spent in _getitem_ in the pybind11 layer which itself is expecting to have to handle strings or integers. Total time: 203.359 s File: /Users/timj/work/lsstsw/build/afw/python/lsst/afw/table/_base.py Function: __getitem__ at line 85   Line # Hits Time Per Hit % Time Line Contents ============================================================== 85 @profile 86 def __getitem__(self, key): 87 """Return the record at index key if key is an integer, 88 return a column if `key` is a string field name or Key, 89 or return a subset of the catalog if key is a slice 90 or boolean NumPy array. 91 """ 92 1908275 803625.0 0.4 0.4 if type(key) is slice: 93 381615 211729.0 0.6 0.1 (start, stop, step) = (key.start, key.stop, key.step) 94 381615 133328.0 0.3 0.1 if step is None: 95 127205 42708.0 0.3 0.0 step = 1 96 381615 121868.0 0.3 0.1 if start is None: 97 start = 0 98 381615 118859.0 0.3 0.1 if stop is None: 99 stop = len(self) 100 381615 583426.0 1.5 0.3 return self.subset(start, stop, step) 101 1526660 669792.0 0.4 0.3 elif isinstance(key, np.ndarray): 102 if key.dtype == bool: 103 return self.subset(key) 104 raise RuntimeError(f"Unsupported array type for indexing non-contiguous Catalog: {key.dtype}") 105 1526660 9493937.0 6.2 4.7 elif isinstance(key, Key) or isinstance(key, str): 106 200 922.0 4.6 0.0 if not self.isContiguous(): 107 if isinstance(key, str): 108 key = self.schema[key].asKey() 109 array = self._getitem_(key) 110 # This array doesn't share memory with the Catalog, so don't let it be modified by 111 # the user who thinks that the Catalog itself is being modified. 112 # Just be aware that this array can only be passed down to C++ as an ndarray::Array<T const> 113 # instead of an ordinary ndarray::Array<T>. If pybind isn't letting it down into C++, 114 # you may have left off the 'const' in the definition. 115 array.flags.writeable = False 116 return array 117 200 3316.0 16.6 0.0 return self.columns[key] 118 else: 119 1526460 191175448.0 125.2 94.0 return self._getitem_(key) So a specialist iterator coming out of the pybind11 might be the best answer.

              People

              Assignee:
              tjenness Tim Jenness
              Reporter:
              tjenness Tim Jenness
              Reviewers:
              Jim Bosch
              Watchers:
              Eli Rykoff, Jim Bosch, Kian-Tat Lim, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.