Investigate AST inefficiencies

XMLWordPrintable

Details

• Type: Story
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
None
• Team:
External

Description

Profiling measureCoaddSources.py reveals that about 1/3 of the time is going to AST transforms, which is much larger than desirable. For example, 15% of the time is spent in malloc, which is used by GetMapping, which is called by the ast::Mapping::apply* family of functions; and ast::Mapping::_tran (32.9%) calls through to something called ValidateMapping.part.59 which uses a whopping 12.8%. The result is that when getting the PSF, we are spending more time in lsst::afw::geom::linearizeTransform than we are in warping the pixels to the coadd frame.

We need to determine whether our use of AST is inefficient or AST is inherently inefficient. Either way, we need to devise a strategy to deal with this.

Attachments

6.59 MB

Activity

Hide
Russell Owen added a comment -

A few notes: it is more efficient to transform many points at once, in one call to pixelToSky/skyToPixel (SkyWcs), applyForward (Transform and astshim) or astTran (AST), than callling that function once for each point. I believe the measurement framework has a few plugins that apply a WCS, which would be one call to pixelToSky per point. It is worth spending some time trying to eliminate that. I suspect it will be difficult to make such a change for forced photometry.

Given your results, it is also possible that we can save some time by caching a copy of the pixel-to-sky mapping inside SkyWcs, instead of using the contained FrameSet to perform the transformation. This would be practical because SkyWcs is immutable, but it does add some memory overhead and construction overhead (assuming we want to do the thread-safe-and-simple thing of filling the cache at construction time, instead of for the first call to pixelToSky or skyToPixel).

Show
Russell Owen added a comment - A few notes: it is more efficient to transform many points at once, in one call to pixelToSky/skyToPixel (SkyWcs), applyForward (Transform and astshim) or astTran (AST), than callling that function once for each point. I believe the measurement framework has a few plugins that apply a WCS, which would be one call to pixelToSky per point. It is worth spending some time trying to eliminate that. I suspect it will be difficult to make such a change for forced photometry. Given your results, it is also possible that we can save some time by caching a copy of the pixel-to-sky mapping inside SkyWcs, instead of using the contained FrameSet to perform the transformation. This would be practical because SkyWcs is immutable, but it does add some memory overhead and construction overhead (assuming we want to do the thread-safe-and-simple thing of filling the cache at construction time, instead of for the first call to pixelToSky or skyToPixel).
Hide
Paul Price added a comment -

I don't see what caching is possible in afw or even astshim. We're spending over a third of the time deep in the bowels of AST: it burns more time than CModel, which is impressive. If AST's large per-call overheads mean that is intended for bulk transformations, it seems like a fundamental mismatch to our operations model which operate source by source.

Show
Paul Price added a comment - I don't see what caching is possible in afw or even astshim. We're spending over a third of the time deep in the bowels of AST: it burns more time than CModel, which is impressive. If AST's large per-call overheads mean that is intended for bulk transformations, it seems like a fundamental mismatch to our operations model which operate source by source.
Hide
Russell Owen added a comment -

The caching I was suggesting was to add std::shared_ptr<TransformPoint2ToSpherePoint> _transform to SkyWcs, have it set at construction using _frameDict.getTransform("PIXELS", "SKY"), and use it for the pixelsToSky and skyToPixels methods.

Show
Russell Owen added a comment - The caching I was suggesting was to add std::shared_ptr<TransformPoint2ToSpherePoint> _transform to SkyWcs, have it set at construction using _frameDict.getTransform("PIXELS", "SKY"), and use it for the pixelsToSky and skyToPixels methods.
Hide
Paul Price added a comment -
Show
Paul Price added a comment - Aren't we already doing that? https://github.com/lsst/afw/blob/master/include/lsst/afw/geom/SkyWcs.h#L409 https://github.com/lsst/afw/blob/master/src/geom/SkyWcs.cc#L320
Hide
Russell Owen added a comment - - edited

Oops. I suggested the cache in the wrong place. The cache should go inside Transform, which is also immutable. At present Transform contains a std::shared_ptr<const ast::FrameSet> _frameSet and I'm suggesting it also gain std::shared_ptr<const ast::Mapping _cachedMapping (where the name is supposed to suggest the reason for the duplication).

Show
Russell Owen added a comment - - edited Oops. I suggested the cache in the wrong place. The cache should go inside Transform, which is also immutable. At present Transform contains a std::shared_ptr<const ast::FrameSet> _frameSet and I'm suggesting it also gain std::shared_ptr<const ast::Mapping _cachedMapping (where the name is supposed to suggest the reason for the duplication).
Hide
Paul Price added a comment -

So, the plan is to put _cachedMapping = _frameSet.getMapping(); and then everywhere we have _transform (or, at least for sky <--> pixel and linearize), we replace it with _cachedMapping? That would let SkyWcs do the parsing and validation at construction time, allowing us to just do "real" calculations when we're running measurement?

I'll give it a go!

Show
Paul Price added a comment - So, the plan is to put _cachedMapping = _frameSet.getMapping(); and then everywhere we have _transform (or, at least for sky <--> pixel and linearize), we replace it with _cachedMapping ? That would let SkyWcs do the parsing and validation at construction time, allowing us to just do "real" calculations when we're running measurement? I'll give it a go!
Hide
Paul Price added a comment -

Russell Owen, your idea worked wonderfully, cutting about 25% off the runtime for measureCoaddSources.py. Would you mind reviewing my implementation?

 price@pap-laptop:~/LSST/afw (tickets/DM-13847=) $git sub commit 2154218af71eecaca1e2dc8397accfcd0a4cee19 (HEAD -> tickets/DM-13847, origin/tickets/DM-13847) Author: Paul Price  Date: Fri Mar 23 00:00:34 2018 -0400    Transform: calculate Mapping at construction time    Much of the runtime performance degradation from SkyWcs appears to be  due to AST rebuilding the transformation every time we want to use it.  By calculating and caching the '_frameSet->getMapping' at construction  time, and using that for the coordinate transformations, we can cut  the corner and save a LOT of time.    Thanks to Russell Owen for the idea.    include/lsst/afw/geom/Transform.h | 5 +++--  src/geom/Transform.cc | 12 +++++++-----  2 files changed, 10 insertions(+), 7 deletions(-)  Show Paul Price added a comment - Russell Owen , your idea worked wonderfully, cutting about 25% off the runtime for measureCoaddSources.py . Would you mind reviewing my implementation? price@pap-laptop:~/LSST/afw (tickets/DM-13847=)$ git sub commit 2154218af71eecaca1e2dc8397accfcd0a4cee19 (HEAD -> tickets/DM-13847, origin/tickets/DM-13847) Author: Paul Price <price@astro.princeton.edu> Date: Fri Mar 23 00:00:34 2018 -0400   Transform: calculate Mapping at construction time Much of the runtime performance degradation from SkyWcs appears to be due to AST rebuilding the transformation every time we want to use it. By calculating and caching the '_frameSet->getMapping' at construction time, and using that for the coordinate transformations, we can cut the corner and save a LOT of time. Thanks to Russell Owen for the idea.   include/lsst/afw/geom/Transform.h | 5 +++-- src/geom/Transform.cc | 12 +++++++----- 2 files changed, 10 insertions(+), 7 deletions(-)
Hide
Paul Price added a comment -

Russell Owen reviewed, leaving comments on the GitHub PR which I've addressed; he also signed off on the exact additions in Slack, so I think he's just forgotten to hit "Review Complete".

Jenkins is green.

Merged to master.

Show
Paul Price added a comment - Russell Owen reviewed, leaving comments on the GitHub PR which I've addressed; he also signed off on the exact additions in Slack, so I think he's just forgotten to hit "Review Complete". Jenkins is green. Merged to master.

People

• Assignee:
Paul Price
Reporter:
Paul Price
Reviewers:
Russell Owen
Watchers:
Jim Bosch, Paul Price, Russell Owen, Tim Jenness