Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-22299

Speed up specific diaCalculation plugins using fast pandas functionality

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: ap_association
    • Labels:
      None

      Description

      While working on DM-21688, it was discovered that certainly plugins in diaCalculationPlugins could be sped up by a factor of ~ 100 by using some of the pandas build in functions.

      This ticket will implement these simple changes and compare the speed of using these plugins that of the current stack. The plug insto change are:

      • NumDiaSourcesDiaPlugin
      • SimpleSourceFlagDiaPlugin
      • SigmaDiaPsFlux
      • MinMaxDiaPsFlux
      • ErrMeanDiaPsFlux
      • SigmaDiaTotFlux

      This is around 30% of the plugins current implemented. One would naively expect a roughly a 30% decrease in the speed of the diaCalculation step. Functions for skew and percentiles are implemented in Pandas GroupBy/Dataframe processing but don't seem to be highly optimized like the above calculations are.

        Attachments

          Issue Links

            Activity

            Show
            cmorrison Chris Morrison added a comment - - edited Jenkins:  https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/30854/pipeline/47
            Hide
            cmorrison Chris Morrison added a comment -

            Tested the calculation using Pandas build in functions on the groupbys compared to the current calculations. Found a roughly 10% increase in speed for ~400 DiaObjects. Below are comparison plots for the time it takes to run all DiaCalculationPlugins vs the number of DiaObjects updated/created.

            For the current stack:

            For the new pandas based calculations:
             

            Show
            cmorrison Chris Morrison added a comment - Tested the calculation using Pandas build in functions on the groupbys compared to the current calculations. Found a roughly 10% increase in speed for ~400 DiaObjects. Below are comparison plots for the time it takes to run all DiaCalculationPlugins vs the number of DiaObjects updated/created. For the current stack: For the new pandas based calculations:  
            Hide
            eggl Siegfried Eggl added a comment -

            Several functions defined in the previous version of the code were exchanged with pandas built-in equivalents. The result is a non-negligible speed-up as portrayed in the figures attached to this ticket. Precision requirements for unit-tests were discussed and agreed upon. The changes in the code are approved.

            Show
            eggl Siegfried Eggl added a comment - Several functions defined in the previous version of the code were exchanged with pandas built-in equivalents. The result is a non-negligible speed-up as portrayed in the figures attached to this ticket. Precision requirements for unit-tests were discussed and agreed upon. The changes in the code are approved.

              People

              • Assignee:
                cmorrison Chris Morrison
                Reporter:
                cmorrison Chris Morrison
                Reviewers:
                Siegfried Eggl
                Watchers:
                Chris Morrison, Eric Bellm, John Swinbank, Siegfried Eggl
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel