Applying aperture corrections in ForcedPhotCoadd seems to be taking a disturbingly long time on full-depth DC2 data.
I would expect the amount of time it takes to evaluate an aperture correction on the coadd to be linear in the number of input epochs, but I would not have expected it to be as bad as it is. I suspect slow, non-vectorized AST evaluations inside SkyWcs are a big part, but I'm not sure (AST calls are way faster if you evaluate many points at once). Given that we are evaluating multiple CoaddBoundedField objects (one for each different aperture-corrected algorithm) at the same set of points, it should be possible to reduce the number of such calls with some kind of caching or reorganization, too.
Those are all just hunches - we should probably try to start with some profiling, at least at the Python level, because that's easy. That may be enough to pinpoint the problem well enough that a harder-to-obtain C++ profile isn't really needed. But we need to make sure we test on a patch of the appropriate depth to see the problem; this Slack post seems to include some data IDs and collections that we may be able to use to reproduce the problem:
i.e. tract=4431, skymap='DC2', patch=3, band='i'
Kenneth Herner, that's a pointer to your u/kherner/2.2i/runs/tract4431-w40 collection at NCSA. Could you you make sure it sticks around until this ticket is done?