# Reduce memory usage in MatchPessemisticB

XMLWordPrintable

#### Details

• Type: Story
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
None
• Story Points:
4
• Sprint:
AP F21-4 (September), AP F21-5 (October)
• Team:
• Urgent?:
No

#### Description

Breaking up DM-29515 into smaller tickets. This ticket is for implementing the first part of the plan laid out in DM-31312.

#### Activity

No builds found.
Chris Morrison [X] (Inactive) created issue -
Chris Morrison [X] (Inactive) made changes -
Field Original Value New Value
Link This issue relates to DM-29515 [ DM-29515 ]
Chris Morrison [X] (Inactive) made changes -
 Link This issue relates to DM-31312 [ DM-31312 ]
Chris Morrison [X] (Inactive) made changes -
 Status To Do [ 10001 ] In Progress [ 3 ]
Hide
Chris Morrison [X] (Inactive) added a comment - - edited

Ran through roughly 40k detectorVisits of the HSC RC2 dataset though the w_2021_39 version of calibrate.py and this ticket's updated matcher with reduced pre-computation and memory usage.

I wrapped a timer around the solveAndFit loop in AstrometryTask and found these headline numbers in seconds:

w_2021_39:
Mean: 3.389, STD: 4.112
Percentiles:    5%,   25%,  50%,  75%,    95%
1.148, 1.546, 2.107, 3.513, 10.473
Min: 0.699, Max: 178.790

this ticket:
Mean: 3.451, STD: 4.542
Percentiles:    5%,  25%,   50%,   75%,    95%
1.045, 1.422, 2.003, 3.560, 11.313
Min: 0.630, Max: 201.265

Overall, not a significantly different change in timing and up till 75% the timing is faster or about just as fast.

The question remaining for this ticket is if this is worth merging now and reap the benefit of a smaller memory footprint with the slightly slower performance for the mean and outlier data or wait for a ticket to rewrite one of the inner loops in C++/pybind11 to gain the slight speed hit back.

Show
Chris Morrison [X] (Inactive) added a comment - - edited Ran through roughly 40k detectorVisits of the HSC RC2 dataset though the w_2021_39 version of calibrate.py and this ticket's updated matcher with reduced pre-computation and memory usage. I wrapped a timer around the solveAndFit loop in AstrometryTask and found these headline numbers in seconds: w_2021_39: Mean: 3.389, STD: 4.112 Percentiles:    5%,   25%,  50%,  75%,    95%                      1.148, 1.546, 2.107, 3.513, 10.473 Min: 0.699, Max: 178.790 this ticket: Mean: 3.451, STD: 4.542 Percentiles:    5%,  25%,   50%,   75%,    95%                     1.045, 1.422, 2.003, 3.560, 11.313 Min: 0.630, Max: 201.265 Overall, not a significantly different change in timing and up till 75% the timing is faster or about just as fast. The question remaining for this ticket is if this is worth merging now and reap the benefit of a smaller memory footprint with the slightly slower performance for the mean and outlier data or wait for a ticket to rewrite one of the inner loops in C++/pybind11 to gain the slight speed hit back.
Hide
Chris Morrison [X] (Inactive) added a comment -
Show
Chris Morrison [X] (Inactive) added a comment - Jenkins: https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/35094/pipeline
Chris Morrison [X] (Inactive) made changes -
 Reviewers Lauren MacArthur [ lauren ] Status In Progress [ 3 ] In Review [ 10004 ]
 Epic Link PREOPS-650 [ 714056 ] Sprint AP F21-4 (September) [ 1113 ]
Hide
Ian Sullivan added a comment -

This seems fine to merge now, since it does help the memory use and the timing is very close to the old performance. I assume DM-32008 is the ticket for the C++ speedup?

Show
Ian Sullivan added a comment - This seems fine to merge now, since it does help the memory use and the timing is very close to the old performance. I assume DM-32008 is the ticket for the C++ speedup?
Hide
Lauren MacArthur added a comment -

I think the above timings look acceptable given the impending speedups that this ticket helps facilitate. While I try to figure out what is actually going on with these code changes (being embarrassingly unfamiliar with memory optimization strategies!), could you please provide some additional diagnostics:

• the memory savings achieved here (for a representative example)
• proof that the results come out identical
Show
Lauren MacArthur added a comment - I think the above timings look acceptable given the impending speedups that this ticket helps facilitate. While I try to figure out what is actually going on with these code changes (being embarrassingly unfamiliar with memory optimization strategies!), could you please provide some additional diagnostics: the memory savings achieved here (for a representative example) proof that the results come out identical
Hide
Chris Morrison [X] (Inactive) added a comment -

Sure. I already check the results with the CI dataset and found they came out the same. Even going to far as to make sure the arrays in the input later functions were the same.

I'll show how much the memory usage is reduced by by checking the difference in how much is stored in the look up arrays in self before and after the change. Sound good?

Show
Chris Morrison [X] (Inactive) added a comment - Sure. I already check the results with the CI dataset and found they came out the same. Even going to far as to make sure the arrays in the input later functions were the same. I'll show how much the memory usage is reduced by by checking the difference in how much is stored in the look up arrays in self before and after the change. Sound good?
Hide
Lauren MacArthur added a comment -

Sounds perfect, thanks!

Show
Lauren MacArthur added a comment - Sounds perfect, thanks!
 Sprint AP F21-4 (September) [ 1113 ] AP F21-4 (September), AP F21-5 (October) [ 1113, 1119 ]
Hide
Chris Morrison [X] (Inactive) added a comment -

Ran through the HiTS and HSC dataset from the ap_verify ci datasets. For HiTS the static of memory stored in lookup tables. is reduced by a factor of ~2.5. For HSC the reduction is a factor of 2.375

Overall the memory stored in self of the class is reduced by a factor of ~2.5 when including all variables stored in the class's self.

Show
Chris Morrison [X] (Inactive) added a comment - Ran through the HiTS and HSC dataset from the ap_verify ci datasets. For HiTS the static of memory stored in lookup tables. is reduced by a factor of ~2.5. For HSC the reduction is a factor of 2.375 Overall the memory stored in self of the class is reduced by a factor of ~2.5 when including all variables stored in the class's self.
Hide
Lauren MacArthur added a comment -

Thanks for the numbers...an improvement, indeed.  I've left a few minor comments on the PR, but otherwise, LGTM.

Show
Lauren MacArthur added a comment - Thanks for the numbers...an improvement, indeed.  I've left a few minor comments on the PR, but otherwise, LGTM.
 Status In Review [ 10004 ] Reviewed [ 10101 ]
Hide
Chris Morrison [X] (Inactive) added a comment -
Show
Chris Morrison [X] (Inactive) added a comment - Jenkins after review: https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/35117/pipeline
Chris Morrison [X] (Inactive) made changes -
 Resolution Done [ 10000 ] Status Reviewed [ 10101 ] Done [ 10002 ]

#### People

Assignee:
Chris Morrison [X] (Inactive)
Reporter:
Chris Morrison [X] (Inactive)
Reviewers:
Lauren MacArthur
Watchers:
Chris Morrison [X] (Inactive), Eric Bellm, Ian Sullivan, Lauren MacArthur