Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-16361

Optimize memory usage in MatchPessimisticB

    XMLWordPrintable

    Details

    • Story Points:
      12
    • Sprint:
      AP S19-3
    • Team:
      Alert Production

      Description

      Colin Slater found memory usage problems with MatchPessimisticB in extremely dense reference fields. This is due to larger data structures that the code makes to the searching steps as fast as possible. The complexity and size of these could be reduced enabling quicker task creation and lower memory overhead at the cost of slightly slower matching. This ticket will implement this reduction in complexity and compare to previous run times.

        Attachments

          Issue Links

            Activity

            Hide
            swinbank John Swinbank added a comment -

            We agreed at our standup of 2018-10-30 that, assuming we get some numbers on DM-16360 that are consistent with the sort of savings that Chris Morrison [X] reckons he can achieve on this ticket, we'll address it in the November 2018 sprint.

            If DM-16360 indicates either that the existing code will be fine in any realistic scenario, or that even with aggressive optimization, PessimisticB will never be adequate, then we should rethink work on this ticket.

            Show
            swinbank John Swinbank added a comment - We agreed at our standup of 2018-10-30 that, assuming we get some numbers on DM-16360 that are consistent with the sort of savings that Chris Morrison [X] reckons he can achieve on this ticket, we'll address it in the November 2018 sprint. If DM-16360 indicates either that the existing code will be fine in any realistic scenario, or that even with aggressive optimization, PessimisticB will never be adequate, then we should rethink work on this ticket.
            Hide
            ctslater Colin Slater added a comment -

            Originally reported in DM-15921.

            Show
            ctslater Colin Slater added a comment - Originally reported in DM-15921 .
            Hide
            swinbank John Swinbank added a comment -

            .

            This is one approach to the problem in DM-15921, but may not be the whole story: if, after implementing the optimizations described above, the memory use is still “excessive”, we'll claim credit for the work done here but keep DM-15921 open to record the problem.

            Show
            swinbank John Swinbank added a comment - . This is one approach to the problem in DM-15921 , but may not be the whole story: if, after implementing the optimizations described above, the memory use is still “excessive”, we'll claim credit for the work done here but keep DM-15921 open to record the problem.
            Hide
            cmorrison Chris Morrison [X] (Inactive) added a comment -

            After talking with Colin Slater today we made a plan to test and implement this.

            First, I'll establish a baseline for memory usage and processing time using ci_hsc and lsst_ci/DECam data.

            The first step will be reducing the precision data arrays from 64bit to 32bit floats. This should reduce the data size by roughly half. After this the matcher will be run though ci again.

            Next, will be to remove all pre-computed and sorted stored 3-vector deltas from the matcher. This should reduce the memory usage in the matcher by a factor of 5 in total without an appreciable increase in run-time. This could be merged in this state with possible further improvements to come in another ticket if needed.

            Show
            cmorrison Chris Morrison [X] (Inactive) added a comment - After talking with Colin Slater today we made a plan to test and implement this. First, I'll establish a baseline for memory usage and processing time using ci_hsc and lsst_ci/DECam data. The first step will be reducing the precision data arrays from 64bit to 32bit floats. This should reduce the data size by roughly half. After this the matcher will be run though ci again. Next, will be to remove all pre-computed and sorted stored 3-vector deltas from the matcher. This should reduce the memory usage in the matcher by a factor of 5 in total without an appreciable increase in run-time. This could be merged in this state with possible further improvements to come in another ticket if needed.
            Hide
            cmorrison Chris Morrison [X] (Inactive) added a comment -

            Okay, here are the numbers so far split into lsst_ci/DECam data and ci_hsc for the state variables that the pessimistic matcher creates. For each data set the average number of reference objects is lsst_ci/DECam: 4886, ci_hsc: 1676.

            Baseline mean-memory:
                lsst_ci: 1276 MB
                ci_hsc: 150 MB

            halve bit precision:
                lsst_ci: 638 MB
                ci_hsc: 75 MB

            Remove 3-vector deltas + halve bit precision:
                lsst_ci: 227 MB
                ci_hsc: 26 MB

            The total savings is then 5.6 time reduction compared to the baseline. This comes at no cost in compute time for either sample. In the cast of lsst_ci, the speed of the mathcer was improved by a factor of 1.4. The ci_hsc data showed only marginal improvement in matching time. Looking at the memory usage reported by `top` during the creation of searchable array data for 10k test objects the difference in memory usage from the basline to the best case is slightly less at a factor 4 reduction in memory usage.

            With this result, I'll clean up the code, add a method to sub-select from the reference catalog if it is too long (including unittest), and push to master (after review of course). I can make a ticket for further investigation optimize the memory usage.

            Show
            cmorrison Chris Morrison [X] (Inactive) added a comment - Okay, here are the numbers so far split into lsst_ci/DECam data and ci_hsc for the state variables that the pessimistic matcher creates. For each data set the average number of reference objects is lsst_ci/DECam: 4886, ci_hsc: 1676. Baseline mean-memory:     lsst_ci: 1276 MB     ci_hsc: 150 MB halve bit precision:     lsst_ci: 638 MB     ci_hsc: 75 MB Remove 3-vector deltas + halve bit precision:     lsst_ci: 227 MB     ci_hsc: 26 MB The total savings is then 5.6 time reduction compared to the baseline. This comes at no cost in compute time for either sample. In the cast of lsst_ci, the speed of the mathcer was improved by a factor of 1.4. The ci_hsc data showed only marginal improvement in matching time. Looking at the memory usage reported by `top` during the creation of searchable array data for 10k test objects the difference in memory usage from the basline to the best case is slightly less at a factor 4 reduction in memory usage. With this result, I'll clean up the code, add a method to sub-select from the reference catalog if it is too long (including unittest), and push to master (after review of course). I can make a ticket for further investigation optimize the memory usage.
            Show
            cmorrison Chris Morrison [X] (Inactive) added a comment - Jenkins run, including ci_hsc: https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/29341/pipeline/46/
            Show
            cmorrison Chris Morrison [X] (Inactive) added a comment - Final Jenkins run after review: https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/29361/pipeline/45

              People

              Assignee:
              cmorrison Chris Morrison [X] (Inactive)
              Reporter:
              cmorrison Chris Morrison [X] (Inactive)
              Reviewers:
              Eli Rykoff
              Watchers:
              Chris Morrison [X] (Inactive), Colin Slater, Eli Rykoff, Eric Bellm, John Swinbank
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.