Trimming out wasteful processing in CharacterizeImageTask

XMLWordPrintable

Details

• Type: RFC
• Resolution: Unresolved
• Component/s:
• Labels:
None

Description

CharacterizeImageTask runs immediately after IsrTask, and is primarily responsible for estimating a PSF model and measuring the aperture corrections for that PSF model on several measurement algorithms. To do this, it detects with a high threshold, deblends, and measures in order to obtain a preferably-isolated, high-purity catalog of moderately bright (but still unsaturated) stars, which is saved as the icSrc dataset. icSrc measurements are essentially never used downstream; it is spatially matched to what will become the src dataset in CalibrateTask in order to propagate PSF modeling flags, but it has no other consumers in the DRP pipeline graph. The set of source measurement plugins run by CharacterizationImageTask is thus the union of two sets:

• the set of plugins used to feed PSF modeling (including star selection);
• the set of plugins for which we need aperture corrections - this is essentially all photometry algorithms other than the fixed circular apertures, including many algorithms focused on galaxy photometry and hence not run again until after coaddition (we coadd aperture corrections in essentially the same way that we coadd PSFs).

A pseudocode version of CharacterizeImageTask looks like this:

 subtract_background() install_guess_psf() for i in range(n_iterations): # n_iterations=2 by default  repair()  detect()  deblend()  measure_sources()  measure_psf() repair() measure_sources() measure_aperture_corrections() apply_aperture_corrections() 

The Problem

There has never been a good reason to run the plugins needed only for aperture correction measurement inside the PSF estimation loop, but the structure of the task doesn't give us a way to configure those separately. Running these slow measurement plugins (especially CModel) currently dominates the runtime of the task.

I am also skeptical that we gain much from running PSF estimation twice; the measurement plugin results shouldn't affect the PSF estimation that much, though there probably is value from running detection again after PSF estimation and it's certainly necessary to run the measurement plugins again afterwards. But this is a question that needs an empirical answer, and that test would need to involve enough data to see if running PSF estimation twice helps with robustness in rare cases, even if it doesn't matter most of the time.

What's Changed

A few recent developments provide both motive and opportunity for fixing a lot of this wastefulness:

• AP has always cared about this task being fast, but they have recently started to put real effort into making that happen (and that's what's revealed how wasteful this task actually is).
• FAFF (and other quick-look commissioning use cases?) also want this task to be fast.
• We've recently switched from PSFEx to Piff for our default PSF measurement code. We believe Piff has a much higher ceiling in terms of PSF model quality, and it may already be better on HSC (a very active topic of investigation), but it's also a lot slower (30s vs. 1s, according to John Parejko), and I doubt we'll be able to recover all of that by working on optimization. So the cost of doing PSF modeling twice has just gone up, and while that 30s is probably tolerable for DRP (even if we'd like to shrink it), it is not tolerable for AP or FAFF, who also need to use CharacterizeImageTask.
• DRP has recently added a new FinalizeCharacterizationTask that re-runs PSF modeling, source measurement, and aperture correction measurement using a consistent cross-visit set of stars, after CalibrateTask (and someday, after background estimation, jointcal, and FGCM). The PSFs and aperture corrections we build coadds from now come from this task. The CharacterizeImageTask PSFs and aperture corrections are thus only used to perform measurements in CalibrateTask.

Proposal

1. Remove all source measurement plugins that are not run in CalibrateTask from CharacterizeImageTask; this should be the set that is currently run only to provide aperture corrections to coadd measurement, and it consists of:

• ext_convolved_ConvolvedFlux
• ext_gaap_GaapFlux
• modelfit_CModel
• modelfit_DoubleShapeletPsfApprox

2. Remove a few more source measurement plugins that are focused on galaxies (I'm not sure why CalibrateTask also runs these, but I'll consider that out of scope for now):

• ext_photometryKron_KronFlux
• ext_shapeHSM_HsmShapeRegauss

3. Remove a few more plugins that I suspect we don't need in icSrc (these are the ones I'm least sure about):

• base_FPPosition: I don't know what consumes this.
• base_SdssShape: mostly superseded by ext_shapeHSM_HsmSourceMoments, which is now slot_Shape.
• ext_shapeHSM_HsmPsfMomentsDebiased: this is a weak-lensing variant of ext_shapeHSM_HsmPsfMoments that I think we only need for "final" DRP PSFs.
• ext_shapeHSM_HsmSourceMomentsRound: this is a weak-lensing variant of ext_shapeHSM_HsmSourceMoments, again I think relevant only for final DRP PSFs.

4. Switch back to PSFEx in CharacterizeImageTask only. It's good enough for everything but DRP, and DRP will run Piff later.

Future Work

As part of what Eli Rykoff has dubbed "the great calibration factor", CharacterizeImageTask will probably go away entirely in the future, at least as part of DRP; we want to replace both it and CalibrateTask with a single task that operates only on bright-ish, isolated stars, with all of our full-depth source measurement happening only after we have our final PSFs (and backgrounds, and astrometry, and photometric calibrations).

But I don't regard fixing CharacterizeImageTask now as proposed by this RFC to be a waste; it's a configuration-only step in that direction, and I think it's the kind that will be increasingly rare in being good for DRP, AP, and commissioning/observing use cases.

Activity

Hide
Jim Bosch added a comment -

I think I'd prefer to get the PSF iterations 2->1 change on a different ticket - I wasn't actually proposing that in this RFC, and while it's something I'd like to do (and think is worth considering extending this RFC to include), I'm just not as confident in asserting that there won't be any fallout from that change.  I think it's something that might merit some test processing - mostly focused on robustness, not PSF quality, so just running CharacterizeImage on a ton of visits to look for failures.

Show
Jim Bosch added a comment - I think I'd prefer to get the PSF iterations 2->1 change on a different ticket - I wasn't actually proposing that in this RFC, and while it's something I'd like to do (and think is worth considering extending this RFC to include), I'm just not as confident in asserting that there won't be any fallout from that change.  I think it's something that might merit some test processing - mostly focused on robustness, not PSF quality, so just running CharacterizeImage on a ton of visits to look for failures.
Hide

I think using PSFEx instead of PiFF in C{{haracterizeImageTask}} should be fine, since PSFEx is not bad. As for a hypothetical use-case why one might want PiFF in C{{haracterizeImageTask}}, GAaP uses the PSF model to determine the kernel to convolve the image with, and using PSFEx in C{{haracterizeImageTask }}where aperture corrections happen but using PiFF models on coadds could potentially make the aperture corrections less accurate? This is not an objection to switching to PSFEx, but a potential use-case why a user might want to switch to using PiFF everywhere consistently, and we should probably do a sanity check once ourself before switching.

Show
Arun Kannawadi added a comment - I think using PSFEx instead of PiFF in C{{haracterizeImageTask}} should be fine, since PSFEx is not bad. As for a hypothetical use-case why one might want PiFF in C{{haracterizeImageTask}}, GAaP uses the PSF model to determine the kernel to convolve the image with, and using PSFEx in C{{haracterizeImageTask }}where aperture corrections happen but using PiFF models on coadds could potentially make the aperture corrections less accurate? This is not an objection to switching to PSFEx, but a potential use-case why a user might want to switch to using PiFF everywhere consistently, and we should probably do a sanity check once ourself before switching.
Hide
John Parejko added a comment -

I have filed 3 triggered tickets (make the default plugin list what we want+piff->psfex change; remove extra plugins from obs packages; running an even more shortened plugin list in the detectAndMeasurePsf loop), plus one more related ticket (psfIterations->1). I think that includes all the changes requested on this RFC?

Show
John Parejko added a comment - I have filed 3 triggered tickets (make the default plugin list what we want+piff->psfex change; remove extra plugins from obs packages; running an even more shortened plugin list in the detectAndMeasurePsf loop), plus one more related ticket (psfIterations->1). I think that includes all the changes requested on this RFC?
Hide
Jim Bosch added a comment - - edited

A blocker on implementing the switch back to PSFEx in CharacterizeImageTask for DRP, noticed by Yusra AlSayyad: we're currently still using (little-m) metrics calculated on the CharacterizeImageTask PSFs to determine which visits go into our coadds, despite the fact that it's the final PSFs whose models actually go in.  That's a problem in its own right, of course, but it would be exacerbated by making the final PSFs and the initial PSFs more different.

Addressing this might need to involve computing new visit summary statistics after final PSF estimation, and that might involve actually implementing full source re-measurement.  I don't recall exactly what goes into the coadd visit selection well enough to say for certain.

Show
Jim Bosch added a comment - - edited A blocker on implementing the switch back to PSFEx in CharacterizeImageTask for DRP, noticed by Yusra AlSayyad : we're currently still using (little-m) metrics calculated on the CharacterizeImageTask PSFs to determine which visits go into our coadds, despite the fact that it's the final PSFs whose models actually go in.  That's a problem in its own right, of course, but it would be exacerbated by making the final PSFs and the initial PSFs more different. Addressing this might need to involve computing new visit summary statistics after final PSF estimation, and that might involve actually implementing full source re-measurement.  I don't recall exactly what goes into the coadd visit selection well enough to say for certain.
Hide
Jim Bosch added a comment -

Thanks, John Parejko , for the triggering tickets.  This is overdue for adoption and I think uncontroversial, so I'm adopting it now.

Show
Jim Bosch added a comment - Thanks, John Parejko , for the triggering tickets.  This is overdue for adoption and I think uncontroversial, so I'm adopting it now.

People

Assignee:
Jim Bosch
Reporter:
Jim Bosch
Watchers:
Arun Kannawadi, Eli Rykoff, Eric Bellm, Ian Sullivan, Jim Bosch, John Parejko