I've gone through a few iterations of profiling and optimization now. Most of the speedup came from
DM-13912, which improved the runtime by ~25-30% by switching from scipy.sparse matrices to analytic `LinearFilter` objects in scarlet. There was also a noticeable (~5-10%) speedup using numpy.zeros in place of numpy.zeros_like, which has an extra copy method to handle initializing arrays of strings.
The above image shows the current profile when running snakeviz. The function that takes the most time is apply_filters, which is the function that performs the fractional translations in x and y. This is not due to the speed of the function (which is about twice as fast as prox_weighted_monotonic, for example) but the number of times it is called. prox_weighted_monotonic is called once per source per iteration while apply_filters is called ~7 times per source per iteration.
Peter Melchior and I have discussed optimizing the proxmin package to improve the algorithm we use to fit the solution now that we are no longer using the slower prox_g proximal operators. Once he implements those changes we have to decide how much time we want to spend trying to optimize scarlet.
What is clear from the profile above is that there aren't any individual functions in the code that will give us large gains in optimization. At this point there are just a lot of small functions that cumulatively add up to a large processing time over many iterations (although a single iteration of scarlet is only a few milliseconds per blend). On the other hand, there might be a few places that we can be more clever about the way we calculate certain quantities.