Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-13854

Investigate PSF cache misses

    Details

    • Team:
      External

      Description

      Profiling reveals that Psf::computeKernelImage is the tall pole in measureCoaddSources.py, taking 47.5% of the runtime (much of this is due to inefficiencies in AST; DM-13847). The calling graphs for Psf::computeKernelImage and Psf::computeShape (which calls Psf::computeKernelImage) show multiple callers with significant contributions, which suggests that the caching functionality in Psf is not as effective as we desire. Investigate these cache misses and devise a plan for improving the cache hits.

                  0.0  ..........        0.01 / 0.36          lsst::afw::detection::Psf::getLocalKernel(lsst::afw::geom::Point<double, 2>, lsst::afw::image::Color) const [2074]
                  0.0  ..........        0.13 / 32.09         lsst::meas::modelfit::DoubleShapeletPsfApproxAlgorithm::measure(lsst::afw::table::SourceRecord&, lsst::afw::image::Exposure<float, int, float> const&) const [347]
                  7.1  ..........      745.86 / 752.92        lsst::meas::extensions::shapeHSM::HsmPsfMomentsAlgorithm::measure(lsst::afw::table::SourceRecord&, lsst::afw::image::Exposure<float, int, float> const&) const [160]
                  7.3  ..........      772.18 / 775.05        lsst::afw::detection::Psf::doComputeImage(lsst::afw::geom::Point<double, 2> const&, lsst::afw::image::Color const&) const [156]
                 33.1  ..........    3'482.05 / 3'495.95      lsst::meas::algorithms::ImagePsf::doComputeShape(lsst::afw::geom::Point<double, 2> const&, lsst::afw::image::Color const&) const [73]
      [67]       47.5    5'000.21        0.07 / 5'000.14    lsst::afw::detection::Psf::computeKernelImage(lsst::afw::geom::Point<double, 2>, lsst::afw::image::Color, lsst::afw::detection::Psf::ImageOwnerEnum) const
                 47.5  ..........    4'999.65 / 4'999.65      lsst::meas::algorithms::CoaddPsf::doComputeKernelImage(lsst::afw::geom::Point<double, 2> const&, lsst::afw::image::Color const&) const [68]
                  0.0  ..........        0.38 / 21.81         lsst::afw::image::Image<double>::Image(lsst::afw::image::Image<double> const&, bool) [411]
                  0.0  ..........        0.10 / 3.91          lsst::afw::image::Image<double>::~Image() [759]
                  0.0  ..........        0.01 / 136.99        operator new(unsigned long) [229]
                  0.0  ..........        0.01 / 0.05          std::_Sp_counted_ptr_inplace<lsst::afw::image::Image<double>, std::allocator<lsst::afw::image::Image<double> >, (__gnu_cxx::_Lock_policy)2>::_M_destroy() [3651]
       
      - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
       
                  2.5  ..........      263.99 / 877.79        lsst::meas::extensions::photometryKron::KronFluxAlgorithm::measure(lsst::afw::table::SourceRecord&, lsst::afw::image::Exposure<float, int, float> const&) const [143]
                  5.6  ..........      587.63 / 587.69        lsst::meas::extensions::photometryKron::calculatePsfKronRadius(std::shared_ptr<lsst::afw::detection::Psf const> const&, lsst::afw::geom::Point<double, 2> const&, double) [171]
                  7.2  ..........      757.15 / 778.02        lsst::meas::base::SdssShapeAlgorithm::measure(lsst::afw::table::SourceRecord&, lsst::afw::image::Exposure<float, int, float> const&) const [154]
                  7.2  ..........      760.64 / 763.68        lsst::meas::base::LocalBackgroundAlgorithm::measure(lsst::afw::table::SourceRecord&, lsst::afw::image::Exposure<float, int, float> const&) const [158]
                  7.4  ..........      774.28 / 799.98        lsst::meas::base::SdssCentroidAlgorithm::measure(lsst::afw::table::SourceRecord&, lsst::afw::image::Exposure<float, int, float> const&) const [151]
      [72]       33.2    3'495.98        0.03 / 3'495.95    lsst::afw::detection::Psf::computeShape(lsst::afw::geom::Point<double, 2>, lsst::afw::image::Color) const
                 33.2  ..........    3'495.95 / 3'495.95      lsst::meas::algorithms::ImagePsf::doComputeShape(lsst::afw::geom::Point<double, 2> const&, lsst::afw::image::Color const&) const [73]
      

        Attachments

          Issue Links

            Activity

            Hide
            jbosch Jim Bosch added a comment -

            I put in the forward declaration. This requires small changes in an additional 4 packages in order to pull in the definitions.

            I've left a few comments on the afw PR that will hopefully get you to the point where the changes in the new packages aren't necessary.  If not, I'm happy to help more; they definitely should not be necessary.

            I would like to fuzz the comparisons (I'm thinking of just int(x/precision), where precision can be user-defined), but because I'm targeting this work for the HSC production run which is already underway, I would like to avoid making any scientific changes on this ticket, no matter how small we believe their effect to be. I will put in a ticket to fuzz the comparisons that can go in separately.

            +1

            measureCoaddSources.py doesn't use a lot of memory: top on tiger3 typically shows it using 1-and-a-bit percent of the total memory, which is a bit over a gigabyte. So though I think there's plenty of room for the cache, the table above shows rather modest gains for sizes about 100, so I backed it off to that.

             That may be true in the usual case, but if we're having to configure maximum footprint sizes or peak counts for the deblender, it can't possibly be true in the worst case.  Glad that we can get most of the benefits from a much smaller cache.

             

            Show
            jbosch Jim Bosch added a comment - I put in the forward declaration. This requires small changes in an additional 4 packages in order to pull in the definitions. I've left a few comments on the afw PR that will hopefully get you to the point where the changes in the new packages aren't necessary.  If not, I'm happy to help more; they definitely should not be necessary. I would like to fuzz the comparisons (I'm thinking of just int(x/precision), where precision can be user-defined), but because I'm targeting this work for the HSC production run which is already underway, I would like to avoid making any scientific changes on this ticket, no matter how small we believe their effect to be. I will put in a ticket to fuzz the comparisons that can go in separately. +1 measureCoaddSources.py  doesn't use a lot of memory:  top  on tiger3 typically shows it using 1-and-a-bit percent of the total memory, which is a bit over a gigabyte. So though I think there's plenty of room for the cache, the table above shows rather modest gains for sizes about 100, so I backed it off to that.  That may be true in the usual case, but if we're having to configure maximum footprint sizes or peak counts for the deblender, it can't possibly be true in the worst case.  Glad that we can get most of the benefits from a much smaller cache.  
            Hide
            price Paul Price added a comment - - edited

            Thanks for walking me through this, Jim. I believe I've got it right now, and Jenkins is green. Do you want to have another quick look in afw?

            Edit: I linked to the wrong Jenkins before. The proper Jenkins has one red, but that looks to be a system problem (despite multiple restarts) that I feel justified to ignore.

            Show
            price Paul Price added a comment - - edited Thanks for walking me through this, Jim. I believe I've got it right now, and Jenkins is green. Do you want to have another quick look in afw? Edit: I linked to the wrong Jenkins before. The proper Jenkins has one red, but that looks to be a system problem (despite multiple restarts) that I feel justified to ignore.
            Hide
            jbosch Jim Bosch added a comment -

            Yup, looks much better. I left two tiny comments on the afw PR, but these are just style issues now, not anything structural.

            Show
            jbosch Jim Bosch added a comment - Yup, looks much better. I left two tiny comments on the afw PR, but these are just style issues now, not anything structural.
            Hide
            price Paul Price added a comment -

            Final measured gain is 42.7% (10,517 sec --> 6,025 sec), in combination with DM-13847.

            Show
            price Paul Price added a comment - Final measured gain is 42.7% (10,517 sec --> 6,025 sec), in combination with DM-13847 .
            Hide
            price Paul Price added a comment -

            Final Jenkins is green.

            Merged to master.

            Show
            price Paul Price added a comment - Final Jenkins is green. Merged to master.

              People

              • Assignee:
                price Paul Price
                Reporter:
                price Paul Price
                Reviewers:
                Jim Bosch
                Watchers:
                Jim Bosch, Paul Price
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel