Ok, I think that last thing outstanding was the disk space for the new footprints. Looking at the deblender output from w_2022_20 it looks like the total size of all the deblendedFlux catalogs in the rc2_subset (one for each band) was 1.5G. For this PR the catalog was 13M, with another 99M for the scarlet models, for a total of 112M. So just the deblender outputs save an order of magnitude disk space, even with an inefficient file format like JSON.
Since the meas and forced_src catalogs just have a lot of data columns and there is still an output in each band, there isn't as much of a savings but it's still substantial (1.9G before vs 787M in this PR for meas, 2.0G vs 343M for forced_src). I don't understand why there is a larger savings in forced_src, since presumably the heavy footprints take up the same amount of disk space.
In terms of time to re-generate the footprints, it's a little complicated, but the short answer is that creating the PSF image is the most expensive part. When using %time in a jupyter notebook it takes ~15-30ms (with a mode ~20ms) to compute the PSF image for each blend. But after the PSF is loaded for a given coordinate, it appears that the PSF is cached so that creating the kernel image again takes only O(10^-6s). Creating the footprints ~linear, so that (after getting the PSF once to cache it and remove it from the timing) creating footprints (convolving each source, re-distributing the flux, and creating the HeavyFootprints) for a blend with 15 sources takes ~20ms (~1.4ms/source), while creating footprints for a blend with 2 sources is ~3.4ms (~1.1ms/source). So speeding up the calculation of the PSF image would go a long way, but we're still looking at under 2 minutes to generate footprints for a catalog with ~25k sources (see below).
The last consideration is that now that the footprints and catalogs are stored separately, anytime a catalog is loaded by a downstream task that doesn't require footprints (which is basically everything downstream of forced photometry in drp_pipe) we save both the time and data transfer of not having to load the footprints, which should also be substantial and help recover the costs of the footprint creation in measurement and forced photometry.
Timings for rc2_subset (~25k sources)
|footprint creation/source (after psf generation)
|footprint creation/source (including psf generation)
|load catalog before
|load and transform catalog with this PR
- This is also the total time added to the pipeline in order to incorporate this change, since this is only done in measurement and forced photometry, and it is already being done once after deblending in the current pipeline.