# trailedSources tests very slow on macOS ARM

XMLWordPrintable

#### Details

• Type: Story
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
• Team:
Architecture
• Urgent?:
No

#### Description

DM-33325 added some monte carlo tests to meas_extensions_trailedSources and they are a disaster for macOS ARM CPUs.

On an old intel mac mini:

 ============================= slowest 5 durations ============================== 32.91s call tests/test_trailedSources.py::TrailedSourcesTestCase_18.823189933596083_5.042446276240214_52.688133441854_0.5272863678675188::testMonteCarlo 32.42s call tests/test_trailedSources.py::TrailedSourcesTestCase_18.824525959248994_3.690341720399934_12.706783496622066_80.0121374592317::testMonteCarlo 30.48s call tests/test_trailedSources.py::TrailedSourcesTestCase_18.778338539085038_3.712755610201576_77.74789748127772_8.918117596264874::testMonteCarlo 29.14s call tests/test_trailedSources.py::TrailedSourcesTestCase_19.632089387567788_4.363012829054862_13.575047038897603_39.65613368621441::testMonteCarlo 28.94s call tests/test_trailedSources.py::TrailedSourcesTestCase_18.978084668646876_0.541677473106708_34.88128456167191_52.19822420855503::testMonteCarlo ] ================= 161 passed, 21 warnings in 144.23s (0:02:24) =================

but on a M1 Pro mac:

 ============================= slowest 5 durations ============================== 213.96s call tests/test_trailedSources.py::TrailedSourcesTestCase_18.823189933596083_5.042446276240214_52.688133441854_0.5272863678675188::testMonteCarlo 209.18s call tests/test_trailedSources.py::TrailedSourcesTestCase_18.824525959248994_3.690341720399934_12.706783496622066_80.0121374592317::testMonteCarlo 185.70s call tests/test_trailedSources.py::TrailedSourcesTestCase_18.778338539085038_3.712755610201576_77.74789748127772_8.918117596264874::testMonteCarlo 176.01s call tests/test_trailedSources.py::TrailedSourcesTestCase_15.552399069150322_1.2533910548710816_24.737438752106133_87.04870569671161::testMonteCarlo 166.66s call tests/test_trailedSources.py::TrailedSourcesTestCase_19.701132323 ================= 161 passed, 21 warnings in 652.78s (0:10:52) =================

It turns out that all the time is spent calling getMaskPlanes in the meas_base noise replacer:

 try: # does it already exist? plane = mask.getMaskPlane(maskname) if self.log: self.log.debug('Mask plane "%s" already existed', maskname) except Exception: # if not, add it; we should delete it when done. plane = mask.addMaskPlane(maskname) self.removeplanes.append(maskname)

Indicating that this is another example of DM-33478.

Replacing the try block with

 if maskname in mask.getMaskPlaneDict():

fixes the problem since it avoids the C++ layer raising an exception.

#### Attachments

1. full.png
350 kB
2. test_trailedSources_profile.png
848 kB

#### Activity

Hide
Eli Rykoff added a comment -

Yes please!

Show
Eli Rykoff added a comment - Yes please!
Hide
Tim Jenness added a comment -

Jim Bosch we've decided to just do the main fix on this ticket. Jenkins is running. The change is trivial.

Show
Tim Jenness added a comment - Jim Bosch we've decided to just do the main fix on this ticket. Jenkins is running. The change is trivial.
Hide
Tim Jenness added a comment -

Thanks for the quick review. I merged what we had so the next weekly won't kill Eli Rykoff's laptop.

Show
Tim Jenness added a comment - Thanks for the quick review. I merged what we had so the next weekly won't kill Eli Rykoff 's laptop.
Hide
Tim Jenness added a comment -

ARM is still slower than the old Mac Mini. Running a line_profiler on the realize method (the measure method is C++) I get:

 Total time: 155.226 s File: /Users/timj/work/lsstsw/build/meas_base/python/lsst/meas/base/tests.py Function: realize at line 518 Line # Hits Time Per Hit % Time Line Contents ============================================================== 518 @profile 519 def realize(self, noise, schema, randomSeed=1): 520 r"""Simulate an exposure and detection catalog for this dataset. 521 522 The simulation includes noise, and the detection catalog includes 523 ~lsst.afw.detection.heavyFootprint.HeavyFootprint\ s. 524 525 Parameters 526 ---------- 527 noise : float 528 Standard deviation of noise to be added to the exposure. The 529 noise will be Gaussian and constant, appropriate for the 530 sky-limited regime. 531 schema : lsst.afw.table.Schema 532 Schema of the new catalog to be created. Must start with 533 self.schema (i.e. schema.contains(self.schema) must be 534 True), but typically contains fields for already-configured 535 measurement algorithms as well. 536 randomSeed : int, optional 537 Seed for the random number generator. 538 If None, a seed is chosen automatically. 539 540 Returns 541 ------- 542 exposure : lsst.afw.image.ExposureF 543 Simulated image. 544 catalog : lsst.afw.table.SourceCatalog 545 Simulated detection catalog. 546 """ 547 127205 13221632.0 103.9 8.5 random_state = np.random.RandomState(randomSeed) 548 127205 230427.0 1.8 0.1 assert schema.contains(self.schema) 549 127205 301874.0 2.4 0.2 mapper = lsst.afw.table.SchemaMapper(self.schema) 550 127205 1154319.0 9.1 0.7 mapper.addMinimalSchema(self.schema, True) 551 127205 2574582.0 20.2 1.7 exposure = self.exposure.clone() 552 127205 950954.0 7.5 0.6 exposure.getMaskedImage().getVariance().getArray()[:, :] = noise**2 553 254410 2588007.0 10.2 1.7 exposure.getMaskedImage().getImage().getArray()[:, :] \ 554 127205 44130268.0 346.9 28.4 += random_state.randn(exposure.getHeight(), exposure.getWidth())*noise 555 127205 52919218.0 416.0 34.1 catalog = lsst.afw.table.SourceCatalog(schema) 556 127205 796583.0 6.3 0.5 catalog.extend(self.catalog, mapper=mapper) 557 # Loop over sources and generate new HeavyFootprints that divide up 558 # the noisy pixels, not the ideal no-noise pixels. 559 254410 36064665.0 141.8 23.2 for record in catalog: 560 # parent objects have non-Heavy Footprints, which don't need to be 561 # updated after adding noise. 562 127205 147800.0 1.2 0.1 if record.getParent() == 0: 563 127205 67545.0 0.5 0.0 continue 564 # get flattened arrays that correspond to the no-noise and noisy 565 # parent images 566 parent = catalog.find(record.getParent()) 567 footprint = parent.getFootprint() 568 parentFluxArrayNoNoise = np.zeros(footprint.getArea(), dtype=np.float32) 569 footprint.spans.flatten(parentFluxArrayNoNoise, 570 self.exposure.getMaskedImage().getImage().getArray(), 571 self.exposure.getXY0()) 572 parentFluxArrayNoisy = np.zeros(footprint.getArea(), dtype=np.float32) 573 footprint.spans.flatten(parentFluxArrayNoisy, 574 exposure.getMaskedImage().getImage().getArray(), 575 exposure.getXY0()) 576 oldHeavy = record.getFootprint() 577 fraction = (oldHeavy.getImageArray() / parentFluxArrayNoNoise) 578 # N.B. this isn't a copy ctor - it's a copy from a vanilla 579 # Footprint, so it doesn't copy the arrays we don't want to 580 # change, and hence we have to do that ourselves below. 581 newHeavy = lsst.afw.detection.HeavyFootprintF(oldHeavy) 582 newHeavy.getImageArray()[:] = parentFluxArrayNoisy*fraction 583 newHeavy.getMaskArray()[:] = oldHeavy.getMaskArray() 584 newHeavy.getVarianceArray()[:] = oldHeavy.getVarianceArray() 585 record.setFootprint(newHeavy) 586 127205 77742.0 0.6 0.1 return exposure, catalog

The 23% in line 559 seems to be the thing that triggers the __getitem__ calls – SourceCatalog does not have an __iter__ method so python is calling getitem repeatedly. Jim Bosch is there a reason that SourceCatalog doesn't have an optimized iterator that doesn't have to go through the same fancy logic as getitem?

Show
Tim Jenness added a comment - ARM is still slower than the old Mac Mini. Running a line_profiler on the realize method (the measure method is C++) I get: Total time: 155.226 s File: /Users/timj/work/lsstsw/build/meas_base/python/lsst/meas/base/tests.py Function: realize at line 518   Line # Hits Time Per Hit % Time Line Contents ============================================================== 518 @profile 519 def realize(self, noise, schema, randomSeed=1): 520 r"""Simulate an exposure and detection catalog for this dataset. 521 522 The simulation includes noise, and the detection catalog includes 523 ~lsst.afw.detection.heavyFootprint.HeavyFootprint\ s. 524 525 Parameters 526 ---------- 527 noise : float 528 Standard deviation of noise to be added to the exposure. The 529 noise will be Gaussian and constant, appropriate for the 530 sky-limited regime. 531 schema : lsst.afw.table.Schema 532 Schema of the new catalog to be created. Must start with 533 self.schema (i.e. schema.contains(self.schema) must be 534 True), but typically contains fields for already-configured 535 measurement algorithms as well. 536 randomSeed : int, optional 537 Seed for the random number generator. 538 If None, a seed is chosen automatically. 539 540 Returns 541 ------- 542 exposure : lsst.afw.image.ExposureF 543 Simulated image. 544 catalog : lsst.afw.table.SourceCatalog 545 Simulated detection catalog. 546 """ 547 127205 13221632.0 103.9 8.5 random_state = np.random.RandomState(randomSeed) 548 127205 230427.0 1.8 0.1 assert schema.contains(self.schema) 549 127205 301874.0 2.4 0.2 mapper = lsst.afw.table.SchemaMapper(self.schema) 550 127205 1154319.0 9.1 0.7 mapper.addMinimalSchema(self.schema, True) 551 127205 2574582.0 20.2 1.7 exposure = self.exposure.clone() 552 127205 950954.0 7.5 0.6 exposure.getMaskedImage().getVariance().getArray()[:, :] = noise**2 553 254410 2588007.0 10.2 1.7 exposure.getMaskedImage().getImage().getArray()[:, :] \ 554 127205 44130268.0 346.9 28.4 += random_state.randn(exposure.getHeight(), exposure.getWidth())*noise 555 127205 52919218.0 416.0 34.1 catalog = lsst.afw.table.SourceCatalog(schema) 556 127205 796583.0 6.3 0.5 catalog.extend(self.catalog, mapper=mapper) 557 # Loop over sources and generate new HeavyFootprints that divide up 558 # the noisy pixels, not the ideal no-noise pixels. 559 254410 36064665.0 141.8 23.2 for record in catalog: 560 # parent objects have non-Heavy Footprints, which don't need to be 561 # updated after adding noise. 562 127205 147800.0 1.2 0.1 if record.getParent() == 0: 563 127205 67545.0 0.5 0.0 continue 564 # get flattened arrays that correspond to the no-noise and noisy 565 # parent images 566 parent = catalog.find(record.getParent()) 567 footprint = parent.getFootprint() 568 parentFluxArrayNoNoise = np.zeros(footprint.getArea(), dtype=np.float32) 569 footprint.spans.flatten(parentFluxArrayNoNoise, 570 self.exposure.getMaskedImage().getImage().getArray(), 571 self.exposure.getXY0()) 572 parentFluxArrayNoisy = np.zeros(footprint.getArea(), dtype=np.float32) 573 footprint.spans.flatten(parentFluxArrayNoisy, 574 exposure.getMaskedImage().getImage().getArray(), 575 exposure.getXY0()) 576 oldHeavy = record.getFootprint() 577 fraction = (oldHeavy.getImageArray() / parentFluxArrayNoNoise) 578 # N.B. this isn't a copy ctor - it's a copy from a vanilla 579 # Footprint, so it doesn't copy the arrays we don't want to 580 # change, and hence we have to do that ourselves below. 581 newHeavy = lsst.afw.detection.HeavyFootprintF(oldHeavy) 582 newHeavy.getImageArray()[:] = parentFluxArrayNoisy*fraction 583 newHeavy.getMaskArray()[:] = oldHeavy.getMaskArray() 584 newHeavy.getVarianceArray()[:] = oldHeavy.getVarianceArray() 585 record.setFootprint(newHeavy) 586 127205 77742.0 0.6 0.1 return exposure, catalog The 23% in line 559 seems to be the thing that triggers the __getitem__ calls – SourceCatalog does not have an __iter__ method so python is calling getitem repeatedly. Jim Bosch is there a reason that SourceCatalog doesn't have an optimized iterator that doesn't have to go through the same fancy logic as getitem?
Hide
Tim Jenness added a comment - - edited

Although looking at __getitem__ it seems that 95% of those calls are spent in _getitem_ in the pybind11 layer which itself is expecting to have to handle strings or integers.

 Total time: 203.359 s File: /Users/timj/work/lsstsw/build/afw/python/lsst/afw/table/_base.py Function: __getitem__ at line 85 Line # Hits Time Per Hit % Time Line Contents ============================================================== 85 @profile 86 def __getitem__(self, key): 87 """Return the record at index key if key is an integer, 88 return a column if key is a string field name or Key, 89 or return a subset of the catalog if key is a slice 90 or boolean NumPy array. 91 """ 92 1908275 803625.0 0.4 0.4 if type(key) is slice: 93 381615 211729.0 0.6 0.1 (start, stop, step) = (key.start, key.stop, key.step) 94 381615 133328.0 0.3 0.1 if step is None: 95 127205 42708.0 0.3 0.0 step = 1 96 381615 121868.0 0.3 0.1 if start is None: 97 start = 0 98 381615 118859.0 0.3 0.1 if stop is None: 99 stop = len(self) 100 381615 583426.0 1.5 0.3 return self.subset(start, stop, step) 101 1526660 669792.0 0.4 0.3 elif isinstance(key, np.ndarray): 102 if key.dtype == bool: 103 return self.subset(key) 104 raise RuntimeError(f"Unsupported array type for indexing non-contiguous Catalog: {key.dtype}") 105 1526660 9493937.0 6.2 4.7 elif isinstance(key, Key) or isinstance(key, str): 106 200 922.0 4.6 0.0 if not self.isContiguous(): 107 if isinstance(key, str): 108 key = self.schema[key].asKey() 109 array = self._getitem_(key) 110 # This array doesn't share memory with the Catalog, so don't let it be modified by 111 # the user who thinks that the Catalog itself is being modified. 112 # Just be aware that this array can only be passed down to C++ as an ndarray::Array 113 # instead of an ordinary ndarray::Array. If pybind isn't letting it down into C++, 114 # you may have left off the 'const' in the definition. 115 array.flags.writeable = False 116 return array 117 200 3316.0 16.6 0.0 return self.columns[key] 118 else: 119 1526460 191175448.0 125.2 94.0 return self._getitem_(key)

So a specialist iterator coming out of the pybind11 might be the best answer.

Show
Tim Jenness added a comment - - edited Although looking at __getitem__ it seems that 95% of those calls are spent in _getitem_ in the pybind11 layer which itself is expecting to have to handle strings or integers. Total time: 203.359 s File: /Users/timj/work/lsstsw/build/afw/python/lsst/afw/table/_base.py Function: __getitem__ at line 85   Line # Hits Time Per Hit % Time Line Contents ============================================================== 85 @profile 86 def __getitem__(self, key): 87 """Return the record at index key if key is an integer, 88 return a column if key is a string field name or Key, 89 or return a subset of the catalog if key is a slice 90 or boolean NumPy array. 91 """ 92 1908275 803625.0 0.4 0.4 if type(key) is slice: 93 381615 211729.0 0.6 0.1 (start, stop, step) = (key.start, key.stop, key.step) 94 381615 133328.0 0.3 0.1 if step is None: 95 127205 42708.0 0.3 0.0 step = 1 96 381615 121868.0 0.3 0.1 if start is None: 97 start = 0 98 381615 118859.0 0.3 0.1 if stop is None: 99 stop = len(self) 100 381615 583426.0 1.5 0.3 return self.subset(start, stop, step) 101 1526660 669792.0 0.4 0.3 elif isinstance(key, np.ndarray): 102 if key.dtype == bool: 103 return self.subset(key) 104 raise RuntimeError(f"Unsupported array type for indexing non-contiguous Catalog: {key.dtype}") 105 1526660 9493937.0 6.2 4.7 elif isinstance(key, Key) or isinstance(key, str): 106 200 922.0 4.6 0.0 if not self.isContiguous(): 107 if isinstance(key, str): 108 key = self.schema[key].asKey() 109 array = self._getitem_(key) 110 # This array doesn't share memory with the Catalog, so don't let it be modified by 111 # the user who thinks that the Catalog itself is being modified. 112 # Just be aware that this array can only be passed down to C++ as an ndarray::Array<T const> 113 # instead of an ordinary ndarray::Array<T>. If pybind isn't letting it down into C++, 114 # you may have left off the 'const' in the definition. 115 array.flags.writeable = False 116 return array 117 200 3316.0 16.6 0.0 return self.columns[key] 118 else: 119 1526460 191175448.0 125.2 94.0 return self._getitem_(key) So a specialist iterator coming out of the pybind11 might be the best answer.

#### People

Assignee:
Tim Jenness
Reporter:
Tim Jenness
Reviewers:
Jim Bosch
Watchers:
Eli Rykoff, Jim Bosch, Kian-Tat Lim, Tim Jenness
Votes:
0 Vote for this issue
Watchers:
4 Start watching this issue

#### Dates

Created:
Updated:
Resolved:

#### Jenkins

No builds found.