# Investigate PSF cache misses

XMLWordPrintable

## Details

• Type: Story
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
None
• Team:
External

## Description

Profiling reveals that Psf::computeKernelImage is the tall pole in measureCoaddSources.py, taking 47.5% of the runtime (much of this is due to inefficiencies in AST; DM-13847). The calling graphs for Psf::computeKernelImage and Psf::computeShape (which calls Psf::computeKernelImage) show multiple callers with significant contributions, which suggests that the caching functionality in Psf is not as effective as we desire. Investigate these cache misses and devise a plan for improving the cache hits.

  0.0 .......... 0.01 / 0.36 lsst::afw::detection::Psf::getLocalKernel(lsst::afw::geom::Point, lsst::afw::image::Color) const [2074]  0.0 .......... 0.13 / 32.09 lsst::meas::modelfit::DoubleShapeletPsfApproxAlgorithm::measure(lsst::afw::table::SourceRecord&, lsst::afw::image::Exposure const&) const [347]  7.1 .......... 745.86 / 752.92 lsst::meas::extensions::shapeHSM::HsmPsfMomentsAlgorithm::measure(lsst::afw::table::SourceRecord&, lsst::afw::image::Exposure const&) const [160]  7.3 .......... 772.18 / 775.05 lsst::afw::detection::Psf::doComputeImage(lsst::afw::geom::Point const&, lsst::afw::image::Color const&) const [156]  33.1 .......... 3'482.05 / 3'495.95 lsst::meas::algorithms::ImagePsf::doComputeShape(lsst::afw::geom::Point const&, lsst::afw::image::Color const&) const [73] [67] 47.5 5'000.21 0.07 / 5'000.14 lsst::afw::detection::Psf::computeKernelImage(lsst::afw::geom::Point, lsst::afw::image::Color, lsst::afw::detection::Psf::ImageOwnerEnum) const  47.5 .......... 4'999.65 / 4'999.65 lsst::meas::algorithms::CoaddPsf::doComputeKernelImage(lsst::afw::geom::Point const&, lsst::afw::image::Color const&) const [68]  0.0 .......... 0.38 / 21.81 lsst::afw::image::Image::Image(lsst::afw::image::Image const&, bool) [411]  0.0 .......... 0.10 / 3.91 lsst::afw::image::Image::~Image() [759]  0.0 .......... 0.01 / 136.99 operator new(unsigned long) [229]  0.0 .......... 0.01 / 0.05 std::_Sp_counted_ptr_inplace, std::allocator >, (__gnu_cxx::_Lock_policy)2>::_M_destroy() [3651]   - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -     2.5 .......... 263.99 / 877.79 lsst::meas::extensions::photometryKron::KronFluxAlgorithm::measure(lsst::afw::table::SourceRecord&, lsst::afw::image::Exposure const&) const [143]  5.6 .......... 587.63 / 587.69 lsst::meas::extensions::photometryKron::calculatePsfKronRadius(std::shared_ptr const&, lsst::afw::geom::Point const&, double) [171]  7.2 .......... 757.15 / 778.02 lsst::meas::base::SdssShapeAlgorithm::measure(lsst::afw::table::SourceRecord&, lsst::afw::image::Exposure const&) const [154]  7.2 .......... 760.64 / 763.68 lsst::meas::base::LocalBackgroundAlgorithm::measure(lsst::afw::table::SourceRecord&, lsst::afw::image::Exposure const&) const [158]  7.4 .......... 774.28 / 799.98 lsst::meas::base::SdssCentroidAlgorithm::measure(lsst::afw::table::SourceRecord&, lsst::afw::image::Exposure const&) const [151] [72] 33.2 3'495.98 0.03 / 3'495.95 lsst::afw::detection::Psf::computeShape(lsst::afw::geom::Point, lsst::afw::image::Color) const  33.2 .......... 3'495.95 / 3'495.95 lsst::meas::algorithms::ImagePsf::doComputeShape(lsst::afw::geom::Point const&, lsst::afw::image::Color const&) const [73] 

## Activity

Hide
Jim Bosch added a comment -

I put in the forward declaration. This requires small changes in an additional 4 packages in order to pull in the definitions.

I've left a few comments on the afw PR that will hopefully get you to the point where the changes in the new packages aren't necessary.  If not, I'm happy to help more; they definitely should not be necessary.

I would like to fuzz the comparisons (I'm thinking of just int(x/precision), where precision can be user-defined), but because I'm targeting this work for the HSC production run which is already underway, I would like to avoid making any scientific changes on this ticket, no matter how small we believe their effect to be. I will put in a ticket to fuzz the comparisons that can go in separately.

+1

measureCoaddSources.py doesn't use a lot of memory: top on tiger3 typically shows it using 1-and-a-bit percent of the total memory, which is a bit over a gigabyte. So though I think there's plenty of room for the cache, the table above shows rather modest gains for sizes about 100, so I backed it off to that.

That may be true in the usual case, but if we're having to configure maximum footprint sizes or peak counts for the deblender, it can't possibly be true in the worst case.  Glad that we can get most of the benefits from a much smaller cache.

Show
Jim Bosch added a comment - I put in the forward declaration. This requires small changes in an additional 4 packages in order to pull in the definitions. I've left a few comments on the afw PR that will hopefully get you to the point where the changes in the new packages aren't necessary.  If not, I'm happy to help more; they definitely should not be necessary. I would like to fuzz the comparisons (I'm thinking of just int(x/precision), where precision can be user-defined), but because I'm targeting this work for the HSC production run which is already underway, I would like to avoid making any scientific changes on this ticket, no matter how small we believe their effect to be. I will put in a ticket to fuzz the comparisons that can go in separately. +1 measureCoaddSources.py  doesn't use a lot of memory:  top  on tiger3 typically shows it using 1-and-a-bit percent of the total memory, which is a bit over a gigabyte. So though I think there's plenty of room for the cache, the table above shows rather modest gains for sizes about 100, so I backed it off to that.  That may be true in the usual case, but if we're having to configure maximum footprint sizes or peak counts for the deblender, it can't possibly be true in the worst case.  Glad that we can get most of the benefits from a much smaller cache.
Hide
Paul Price added a comment - - edited

Thanks for walking me through this, Jim. I believe I've got it right now, and Jenkins is green. Do you want to have another quick look in afw?

Edit: I linked to the wrong Jenkins before. The proper Jenkins has one red, but that looks to be a system problem (despite multiple restarts) that I feel justified to ignore.

Show
Paul Price added a comment - - edited Thanks for walking me through this, Jim. I believe I've got it right now, and Jenkins is green. Do you want to have another quick look in afw? Edit: I linked to the wrong Jenkins before. The proper Jenkins has one red, but that looks to be a system problem (despite multiple restarts) that I feel justified to ignore.
Hide
Jim Bosch added a comment -

Yup, looks much better. I left two tiny comments on the afw PR, but these are just style issues now, not anything structural.

Show
Jim Bosch added a comment - Yup, looks much better. I left two tiny comments on the afw PR, but these are just style issues now, not anything structural.
Hide
Paul Price added a comment -

Final measured gain is 42.7% (10,517 sec --> 6,025 sec), in combination with DM-13847.

Show
Paul Price added a comment - Final measured gain is 42.7% (10,517 sec --> 6,025 sec), in combination with DM-13847 .
Hide
Paul Price added a comment -

Final Jenkins is green.

Merged to master.

Show
Paul Price added a comment - Final Jenkins is green. Merged to master.

## People

• Assignee:
Paul Price
Reporter:
Paul Price
Reviewers:
Jim Bosch
Watchers:
Jim Bosch, Paul Price