Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-33514

Refactor pessimisticMatcher C++

    XMLWordPrintable

    Details

    • Type: Story
    • Status: To Do
    • Resolution: Unresolved
    • Fix Version/s: None
    • Component/s: meas_astrom
    • Labels:

      Description

      Krzysztof Findeisen pointed out a number of possible refactorings in his review of DM-32299. I've paraphrased his suggestions here, so they don't hold up on merging that ticket. I don't know if all of these are worth implementing, and some might be good to split onto separate tickets, but quite a few are intertwined enough that it's probably worth doing most of them together.

      • A class that represents "a set of paired objects" to simplify the function inputs.
      • Cleanup uses of `find_candidate_reference_pair_range()` to only use the `ndarray::Array ref_dist_array` version (removing the one with std::vector).
      • Convert the ndarray args of `test_rotation` (and likely other functions too) to `Eigen::vector3d`. They are 3-vectors, and that should be enforced. It will also reduce the number of `ndarray::asEigenMatrix` calls.
      • Standardize the inputs to `create_pattern_spokes`, so that e.g., "all parameters described as 3-vectors should be Vector3d, while parameters that scale with the source catalogs should be Array".
      • Possibly replace `PatternResult` with a standalone class, or convert to pipe.base.Struct or map/dict in the pybind11 layer. Quote from Krzysztof's comment at [1].
      • Make `update_index` handle the entire loop iteration, so we don't need both `ref_dist_idx`/`midpoint` and `idx`, and instead ust get an appropriate iterator for those two inside-out for loops. Watch out for the shadowing of `idx` in the final inner loop just before `intermediate_verify_comparison` is called.
      • Further refactor the main loop of construct_pattern_and_shift_rot_matrix` into separate functions for each block,
        to help emphasize the individual components (which could each separately cause a loop `continue`). Related to this, cleanup the use of the `*tmp_ref_pair_list.begin()` iterator, and instead use the index checks (currently it uses both iterators and indexes).
      • `create_sorted_arrays` is quadratic in the length of reference_array: it might be worth changing to use std::sort. Krzysztof's comment at [2].

      1:

      Instead of forcing Python users to use PatternResult, it wouldn't be better to convert the return value of construct_pattern_and_shift_rot_matrix to a Python type such as pipe.base.Struct. I see two ways to do this:

      Use the Pybind11 API to import and construct a Struct directly in the wrapper. This approach is easier to understand, but clunky, and I don't know how to handle errors (in particular, failure to import Struct).
      Convert to a map or a built-in Python type such as dict, then create a pure Python customization module to convert that to a Struct. This approach is easier to write, but anybody looking at pessimisticPatternMatcherUtils.cc won't realize that the end result is a Struct.

      On the other hand, I do consider multiple return values an anti-pattern, so maybe it would be better to keep PatternResult and turn it into a standalone class rather than something tied directly to construct_pattern_and_shift_rot_matrix...

      2:

      This algorithm is quadratic is the length of reference_array, since insert is a linear operation for vector. Unfortunately, working with list instead of vector won't help, because computing ids_itr in less than linear time requires random access.

      If reference_array is likely to be long, consider working with std::sort instead, which is guaranteed to be log-linear. The downside is that unless you go to the trouble of caching the distance, you will have to compute it O(N log N) times rather than O(N). I don't know where the tradeoff lies in this case.

        Attachments

          Issue Links

            Activity

            Hide
            krzys Krzysztof Findeisen added a comment - - edited

            Another thing on the to-do list is to consider changing the matcher IDs throughout meas_astrom from uint16_t to a descriptive type alias. This will avoid confusion with other uses of unsigned ints in meas_*, and provide an easy-to-find place to document the caveats about which integer type is to be used.

            Show
            krzys Krzysztof Findeisen added a comment - - edited Another thing on the to-do list is to consider changing the matcher IDs throughout meas_astrom from uint16_t to a descriptive type alias. This will avoid confusion with other uses of unsigned ints in meas_* , and provide an easy-to-find place to document the caveats about which integer type is to be used.

              People

              Assignee:
              Unassigned Unassigned
              Reporter:
              Parejkoj John Parejko
              Watchers:
              Ian Sullivan, John Parejko, Krzysztof Findeisen
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:

                  Jenkins Builds

                  No builds found.