Details
-
Type:
Story
-
Status: Done
-
Resolution: Done
-
Fix Version/s: None
-
Component/s: afw, meas_base, pipe_tasks, utils
-
Labels:None
-
Team:External
Description
Profiling reveals that Psf::computeKernelImage is the tall pole in measureCoaddSources.py, taking 47.5% of the runtime (much of this is due to inefficiencies in AST; DM-13847). The calling graphs for Psf::computeKernelImage and Psf::computeShape (which calls Psf::computeKernelImage) show multiple callers with significant contributions, which suggests that the caching functionality in Psf is not as effective as we desire. Investigate these cache misses and devise a plan for improving the cache hits.
0.0 .......... 0.01 / 0.36 lsst::afw::detection::Psf::getLocalKernel(lsst::afw::geom::Point<double, 2>, lsst::afw::image::Color) const [2074]
|
0.0 .......... 0.13 / 32.09 lsst::meas::modelfit::DoubleShapeletPsfApproxAlgorithm::measure(lsst::afw::table::SourceRecord&, lsst::afw::image::Exposure<float, int, float> const&) const [347]
|
7.1 .......... 745.86 / 752.92 lsst::meas::extensions::shapeHSM::HsmPsfMomentsAlgorithm::measure(lsst::afw::table::SourceRecord&, lsst::afw::image::Exposure<float, int, float> const&) const [160]
|
7.3 .......... 772.18 / 775.05 lsst::afw::detection::Psf::doComputeImage(lsst::afw::geom::Point<double, 2> const&, lsst::afw::image::Color const&) const [156]
|
33.1 .......... 3'482.05 / 3'495.95 lsst::meas::algorithms::ImagePsf::doComputeShape(lsst::afw::geom::Point<double, 2> const&, lsst::afw::image::Color const&) const [73]
|
[67] 47.5 5'000.21 0.07 / 5'000.14 lsst::afw::detection::Psf::computeKernelImage(lsst::afw::geom::Point<double, 2>, lsst::afw::image::Color, lsst::afw::detection::Psf::ImageOwnerEnum) const
|
47.5 .......... 4'999.65 / 4'999.65 lsst::meas::algorithms::CoaddPsf::doComputeKernelImage(lsst::afw::geom::Point<double, 2> const&, lsst::afw::image::Color const&) const [68]
|
0.0 .......... 0.38 / 21.81 lsst::afw::image::Image<double>::Image(lsst::afw::image::Image<double> const&, bool) [411]
|
0.0 .......... 0.10 / 3.91 lsst::afw::image::Image<double>::~Image() [759]
|
0.0 .......... 0.01 / 136.99 operator new(unsigned long) [229]
|
0.0 .......... 0.01 / 0.05 std::_Sp_counted_ptr_inplace<lsst::afw::image::Image<double>, std::allocator<lsst::afw::image::Image<double> >, (__gnu_cxx::_Lock_policy)2>::_M_destroy() [3651]
|
|
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
|
|
2.5 .......... 263.99 / 877.79 lsst::meas::extensions::photometryKron::KronFluxAlgorithm::measure(lsst::afw::table::SourceRecord&, lsst::afw::image::Exposure<float, int, float> const&) const [143]
|
5.6 .......... 587.63 / 587.69 lsst::meas::extensions::photometryKron::calculatePsfKronRadius(std::shared_ptr<lsst::afw::detection::Psf const> const&, lsst::afw::geom::Point<double, 2> const&, double) [171]
|
7.2 .......... 757.15 / 778.02 lsst::meas::base::SdssShapeAlgorithm::measure(lsst::afw::table::SourceRecord&, lsst::afw::image::Exposure<float, int, float> const&) const [154]
|
7.2 .......... 760.64 / 763.68 lsst::meas::base::LocalBackgroundAlgorithm::measure(lsst::afw::table::SourceRecord&, lsst::afw::image::Exposure<float, int, float> const&) const [158]
|
7.4 .......... 774.28 / 799.98 lsst::meas::base::SdssCentroidAlgorithm::measure(lsst::afw::table::SourceRecord&, lsst::afw::image::Exposure<float, int, float> const&) const [151]
|
[72] 33.2 3'495.98 0.03 / 3'495.95 lsst::afw::detection::Psf::computeShape(lsst::afw::geom::Point<double, 2>, lsst::afw::image::Color) const
|
33.2 .......... 3'495.95 / 3'495.95 lsst::meas::algorithms::ImagePsf::doComputeShape(lsst::afw::geom::Point<double, 2> const&, lsst::afw::image::Color const&) const [73]
|
Attachments
Issue Links
- blocks
-
DM-13665 Finalize the stack version, step, and config for the S18 PDR1 reprocessing
- Done
I've left a few comments on the afw PR that will hopefully get you to the point where the changes in the new packages aren't necessary. If not, I'm happy to help more; they definitely should not be necessary.
+1
That may be true in the usual case, but if we're having to configure maximum footprint sizes or peak counts for the deblender, it can't possibly be true in the worst case. Glad that we can get most of the benefits from a much smaller cache.