# Optimize memory usage in MatchPessimisticB

XMLWordPrintable

#### Details

• Type: Story
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
• Story Points:
12
• Epic Link:
• Sprint:
AP S19-3
• Team:
Alert Production

#### Description

Colin Slater found memory usage problems with MatchPessimisticB in extremely dense reference fields. This is due to larger data structures that the code makes to the searching steps as fast as possible. The complexity and size of these could be reduced enabling quicker task creation and lower memory overhead at the cost of slightly slower matching. This ticket will implement this reduction in complexity and compare to previous run times.

#### Activity

Hide
John Swinbank added a comment -

We agreed at our standup of 2018-10-30 that, assuming we get some numbers on DM-16360 that are consistent with the sort of savings that Chris Morrison [X] reckons he can achieve on this ticket, we'll address it in the November 2018 sprint.

If DM-16360 indicates either that the existing code will be fine in any realistic scenario, or that even with aggressive optimization, PessimisticB will never be adequate, then we should rethink work on this ticket.

Show
John Swinbank added a comment - We agreed at our standup of 2018-10-30 that, assuming we get some numbers on DM-16360 that are consistent with the sort of savings that Chris Morrison [X] reckons he can achieve on this ticket, we'll address it in the November 2018 sprint. If DM-16360 indicates either that the existing code will be fine in any realistic scenario, or that even with aggressive optimization, PessimisticB will never be adequate, then we should rethink work on this ticket.
Hide
Colin Slater added a comment -

Originally reported in DM-15921.

Show
Colin Slater added a comment - Originally reported in DM-15921 .
Hide
John Swinbank added a comment -

.

This is one approach to the problem in DM-15921, but may not be the whole story: if, after implementing the optimizations described above, the memory use is still “excessive”, we'll claim credit for the work done here but keep DM-15921 open to record the problem.

Show
John Swinbank added a comment - . This is one approach to the problem in DM-15921 , but may not be the whole story: if, after implementing the optimizations described above, the memory use is still “excessive”, we'll claim credit for the work done here but keep DM-15921 open to record the problem.
Hide
Chris Morrison [X] (Inactive) added a comment -

After talking with Colin Slater today we made a plan to test and implement this.

First, I'll establish a baseline for memory usage and processing time using ci_hsc and lsst_ci/DECam data.

The first step will be reducing the precision data arrays from 64bit to 32bit floats. This should reduce the data size by roughly half. After this the matcher will be run though ci again.

Next, will be to remove all pre-computed and sorted stored 3-vector deltas from the matcher. This should reduce the memory usage in the matcher by a factor of 5 in total without an appreciable increase in run-time. This could be merged in this state with possible further improvements to come in another ticket if needed.

Show
Chris Morrison [X] (Inactive) added a comment - After talking with Colin Slater today we made a plan to test and implement this. First, I'll establish a baseline for memory usage and processing time using ci_hsc and lsst_ci/DECam data. The first step will be reducing the precision data arrays from 64bit to 32bit floats. This should reduce the data size by roughly half. After this the matcher will be run though ci again. Next, will be to remove all pre-computed and sorted stored 3-vector deltas from the matcher. This should reduce the memory usage in the matcher by a factor of 5 in total without an appreciable increase in run-time. This could be merged in this state with possible further improvements to come in another ticket if needed.
Hide
Chris Morrison [X] (Inactive) added a comment -

Okay, here are the numbers so far split into lsst_ci/DECam data and ci_hsc for the state variables that the pessimistic matcher creates. For each data set the average number of reference objects is lsst_ci/DECam: 4886, ci_hsc: 1676.

Baseline mean-memory:
lsst_ci: 1276 MB
ci_hsc: 150 MB

halve bit precision:
lsst_ci: 638 MB
ci_hsc: 75 MB

Remove 3-vector deltas + halve bit precision:
lsst_ci: 227 MB
ci_hsc: 26 MB

The total savings is then 5.6 time reduction compared to the baseline. This comes at no cost in compute time for either sample. In the cast of lsst_ci, the speed of the mathcer was improved by a factor of 1.4. The ci_hsc data showed only marginal improvement in matching time. Looking at the memory usage reported by top during the creation of searchable array data for 10k test objects the difference in memory usage from the basline to the best case is slightly less at a factor 4 reduction in memory usage.

With this result, I'll clean up the code, add a method to sub-select from the reference catalog if it is too long (including unittest), and push to master (after review of course). I can make a ticket for further investigation optimize the memory usage.

Show
Chris Morrison [X] (Inactive) added a comment - Okay, here are the numbers so far split into lsst_ci/DECam data and ci_hsc for the state variables that the pessimistic matcher creates. For each data set the average number of reference objects is lsst_ci/DECam: 4886, ci_hsc: 1676. Baseline mean-memory:     lsst_ci: 1276 MB     ci_hsc: 150 MB halve bit precision:     lsst_ci: 638 MB     ci_hsc: 75 MB Remove 3-vector deltas + halve bit precision:     lsst_ci: 227 MB     ci_hsc: 26 MB The total savings is then 5.6 time reduction compared to the baseline. This comes at no cost in compute time for either sample. In the cast of lsst_ci, the speed of the mathcer was improved by a factor of 1.4. The ci_hsc data showed only marginal improvement in matching time. Looking at the memory usage reported by top during the creation of searchable array data for 10k test objects the difference in memory usage from the basline to the best case is slightly less at a factor 4 reduction in memory usage. With this result, I'll clean up the code, add a method to sub-select from the reference catalog if it is too long (including unittest), and push to master (after review of course). I can make a ticket for further investigation optimize the memory usage.
Show
Chris Morrison [X] (Inactive) added a comment - Jenkins run, including ci_hsc: https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/29341/pipeline/46/
Show
Chris Morrison [X] (Inactive) added a comment - Final Jenkins run after review: https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/29361/pipeline/45

#### People

Assignee:
Chris Morrison [X] (Inactive)
Reporter:
Chris Morrison [X] (Inactive)
Reviewers:
Eli Rykoff
Watchers:
Chris Morrison [X] (Inactive), Colin Slater, Eli Rykoff, Eric Bellm, John Swinbank
Votes:
0 Vote for this issue
Watchers:
5 Start watching this issue

#### Dates

Created:
Updated:
Resolved:

#### Jenkins

No builds found.