I've gone through a few iterations of profiling and optimization now. Most of the speedup came from ~~DM-13912~~, which improved the runtime by ~25-30% by switching from scipy.sparse matrices to analytic `LinearFilter` objects in scarlet. There was also a noticeable (~5-10%) speedup using `numpy.zeros` in place of `numpy.zeros_like`, which has an extra copy method to handle initializing arrays of strings.

The above image shows the current profile when running snakeviz. The function that takes the most time is `apply_filters`, which is the function that performs the fractional translations in x and y. This is not due to the speed of the function (which is about twice as fast as `prox_weighted_monotonic`, for example) but the number of times it is called. `prox_weighted_monotonic` is called once per source per iteration while `apply_filters` is called ~7 times per source per iteration.

Peter Melchior and I have discussed optimizing the `proxmin` package to improve the algorithm we use to fit the solution now that we are no longer using the slower `prox_g` proximal operators. Once he implements those changes we have to decide how much time we want to spend trying to optimize scarlet.

What is clear from the profile above is that there aren't any individual functions in the code that will give us large gains in optimization. At this point there are just a lot of small functions that cumulatively add up to a large processing time over many iterations (although a single iteration of scarlet is only a few milliseconds per blend). On the other hand, there might be a few places that we can be more clever about the way we calculate certain quantities.

I've gone through a few iterations of profiling and optimization now. Most of the speedup came from

~~DM-13912~~, which improved the runtime by ~25-30% by switching from scipy.sparse matrices to analytic `LinearFilter` objects in scarlet. There was also a noticeable (~5-10%) speedup usingnumpy.zerosin place ofnumpy.zeros_like, which has an extra copy method to handle initializing arrays of strings.The above image shows the current profile when running snakeviz. The function that takes the most time is

apply_filters, which is the function that performs the fractional translations in x and y. This is not due to the speed of the function (which is about twice as fast asprox_weighted_monotonic, for example) but the number of times it is called.prox_weighted_monotonicis called once per source per iteration whileapply_filtersis called ~7 times per source per iteration.Peter Melchior and I have discussed optimizing the

proxminpackage to improve the algorithm we use to fit the solution now that we are no longer using the slowerprox_gproximal operators. Once he implements those changes we have to decide how much time we want to spend trying to optimize scarlet.What is clear from the profile above is that there aren't any individual functions in the code that will give us large gains in optimization. At this point there are just a lot of small functions that cumulatively add up to a large processing time over many iterations (although a single iteration of scarlet is only a few milliseconds per blend). On the other hand, there might be a few places that we can be more clever about the way we calculate certain quantities.