Most of the work that I put into this ticket is in the new Appendix C of DMTN-036 (pdf, PR). But that's just a detailed discussion of one of the priorities, performance optimization (and specifically one way to go about addressing it). I'll just put my full list below:
1. Gen3/PipelineTask conversion. This may be blocked on Nate Lust/Jim Bosch work (who may also do some of direct work in jointcal, as well as adding support functionality), but when it isn't, this is my highest priority. We should resist the urge to do other cleanups at this stage; the goal is to make it possible to deprecate Gen2.
2. Performance optimization. This is more about demonstrating that jointcal can scale than addressing a pressing need - while running HSC UltraDeep is painful right now, I think we could live with that for a while if we were confident we can make it better in the future, because HSC Wide performance is okay. But until we have something we're confident can scale, I'm not confident making many other changes. But there are a few different ways to go about trying to speed things up, and some of them actually involve functionality we want for other reasons. Do not read too much into my ordering of bullets below; I'd like to get John Parejko's thoughts on that rather than impose my own. But the first thing to do is to look at and internalize the plots on DM-23252, and ask Nate Lust about any questions that raises.
- Perform due diligence on the existence of other sparse Cholesky solvers, especially parallel ones. If we can find a drop-in replacement that's faster, we should give it a try before doing anything else. I don't know of one, and a quick Google search didn't find anything but papers, but that doesn't mean they don't exist.
- Try to reduce the number of outliers that need to be rejected at each step. Trying to filter out likely outliers before even starting the fit is what I'd try first, here; maybe there are some existing flags that are a good predictor of whether we will reject something later? After that, we could try just being more or less aggressive, or perhaps increasing how aggressive we are only after a few iterations (so we don't reject good matches just because they were far from some intermediate solution). The other option to try here is to add motion parameters, on the assumption that most of our outliers just have high proper motions. More on that below, but note while this makes sense scientifically, it's also quite possible that adding more parameters would slow down the rank-update step enough to cancel out any benefits we gain from invoking it less. Using motion parameters from the reference catalog but not fitting would be a good middle ground.
- Switch from sparse to a reduced dense problem, as per Appendix C of DMTN-036. That approach should scale much better with the number of parameters per star or number of stars (so good to do it before adding motion parameters), but it lacks an analog to the rank-update step, so it will scale with the number of outlier rejection iterations, not the number of outliers rejected. That means we might benefit from doing some outlier rejection work before this, but we might also do some of the outlier rejection work differently once this is in place.
3. Use stellar motion parameters from the reference catalog.
4. Fit for stellar motion parameters.
5. (Probably not actually particularly near-term, given all of the above, but what looks like next up, and it may inform earlier work...) Rework I/O and matching, splitting those into a separate PipelineTask that outputs a matched catalog. That catalog would be used not just by jointcal, but by FGCM, other future PipelineTasks (e.g. PSF re-estimation from a consistent catalog of stars), and a lot of validation code. It might be generated several steps before jointcal is run, and it will be a match across all bands, even if jointcal continues to only fit one at a time. This will be necessary to get DCR-corrected positions (which require colors) into jointcal. This may not actually involve reusing matching code from jointcal - we have other code (e.g. FGCM) doing similar things that we could borrow from instead, and I'd like this step to use Parquet/Pandas as both inputs and outputs rather than the kind of data structures jointcal uses internally. But there is certainly jointcal work involved (at least) in adapting it to use a pre-matched catalog as input. Eventually we would also want it to have jointcal "update" this catalog (probably by writing a new catalog with the same rows and a few new columns) with its best per-object positions.