Krzysztof Findeisen pointed out a number of possible refactorings in his review of
DM-32299. I've paraphrased his suggestions here, so they don't hold up on merging that ticket. I don't know if all of these are worth implementing, and some might be good to split onto separate tickets, but quite a few are intertwined enough that it's probably worth doing most of them together.
- A class that represents "a set of paired objects" to simplify the function inputs.
- Cleanup uses of `find_candidate_reference_pair_range()` to only use the `ndarray::Array ref_dist_array` version (removing the one with std::vector).
- Convert the ndarray args of `test_rotation` (and likely other functions too) to `Eigen::vector3d`. They are 3-vectors, and that should be enforced. It will also reduce the number of `ndarray::asEigenMatrix` calls.
- Standardize the inputs to `create_pattern_spokes`, so that e.g., "all parameters described as 3-vectors should be Vector3d, while parameters that scale with the source catalogs should be Array".
- Possibly replace `PatternResult` with a standalone class, or convert to pipe.base.Struct or map/dict in the pybind11 layer. Quote from Krzysztof's comment at .
- Make `update_index` handle the entire loop iteration, so we don't need both `ref_dist_idx`/`midpoint` and `idx`, and instead ust get an appropriate iterator for those two inside-out for loops. Watch out for the shadowing of `idx` in the final inner loop just before `intermediate_verify_comparison` is called.
- Further refactor the main loop of construct_pattern_and_shift_rot_matrix` into separate functions for each block,
to help emphasize the individual components (which could each separately cause a loop `continue`). Related to this, cleanup the use of the `*tmp_ref_pair_list.begin()` iterator, and instead use the index checks (currently it uses both iterators and indexes).
- `create_sorted_arrays` is quadratic in the length of reference_array: it might be worth changing to use std::sort. Krzysztof's comment at .
Instead of forcing Python users to use PatternResult, it wouldn't be better to convert the return value of construct_pattern_and_shift_rot_matrix to a Python type such as pipe.base.Struct. I see two ways to do this:
Use the Pybind11 API to import and construct a Struct directly in the wrapper. This approach is easier to understand, but clunky, and I don't know how to handle errors (in particular, failure to import Struct).
Convert to a map or a built-in Python type such as dict, then create a pure Python customization module to convert that to a Struct. This approach is easier to write, but anybody looking at pessimisticPatternMatcherUtils.cc won't realize that the end result is a Struct.
On the other hand, I do consider multiple return values an anti-pattern, so maybe it would be better to keep PatternResult and turn it into a standalone class rather than something tied directly to construct_pattern_and_shift_rot_matrix...
This algorithm is quadratic is the length of reference_array, since insert is a linear operation for vector. Unfortunately, working with list instead of vector won't help, because computing ids_itr in less than linear time requires random access.
If reference_array is likely to be long, consider working with std::sort instead, which is guaranteed to be log-linear. The downside is that unless you go to the trouble of caching the distance, you will have to compute it O(N log N) times rather than O(N). I don't know where the tradeoff lies in this case.