Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-18494

Mimic the meas_base plugin system for use in ap_association, DiaObject summary metrics

    Details

    • Story Points:
      8
    • Sprint:
      AP S19-6, AP F19-1, AP F19-2
    • Team:
      Alert Production

      Description

      Currently all the summary statistics created in DM-18318 are haphazardly listed in a single function and run as a block. This ticket will make the summary statistics run configurable, expandable, etc. by mimicking the measurement plugin system implemented for the measurement tasks within meas_base.

        Attachments

          Issue Links

            Activity

            Hide
            cmorrison Chris Morrison added a comment - - edited

            Jenkins run: https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/30374/pipeline

            Created the initual measurement task and plugins with this ticket. Will spawn two tickets after this to: Recreate all of the current DiaObject/time series features in the plugin system; migrate AssociationTask to use the new plugins.

            Show
            cmorrison Chris Morrison added a comment - - edited Jenkins run: https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/30374/pipeline Created the initual measurement task and plugins with this ticket. Will spawn two tickets after this to: Recreate all of the current DiaObject/time series features in the plugin system; migrate AssociationTask to use the new plugins.
            Hide
            cmorrison Chris Morrison added a comment - - edited

            Additionally create two tickets triggered by this one that are: create plugins, and migrate to plugin system respectively.

            Show
            cmorrison Chris Morrison added a comment - - edited Additionally create two tickets triggered by this one that are: create plugins, and migrate to plugin system respectively.
            Hide
            swinbank John Swinbank added a comment -

            Thanks Chris — nice work!

            However, I have left a bunch of comments on GitHub, of a variety of levels of pickiness.

            I guess the big issue I don't understand is why we convert everything from a Pandas DataFrame to dicts, then convert them all back again at the end of the calculations. This seems to be less than ideal for a few reasons:

            • That conversion presumably isn't particularly fast;
            • I'm guessing (without testing) that it would be more efficient for the calculations to be done column-wise on the DataFrame rather than by iterating over each DIAObject individually;
            • The DataFrame has a well-structured data model (somewhat equivalent to the afw::table schema) which I think would actually be useful here (to avoid plugins clobbering each other's results, for example).

            I should add that I'm a total Pandas neophyte, so it might be that I've missed something obvious. Can you fill me in on the thinking?

            I think the answer to the above has a bearing on a bunch of my comments on GitHub — some of my comments there will become more or less important when I understand how Pandas fits in to the big picture — so it's probably not worth your while spending a lot of time on them until we've converged here.

            Show
            swinbank John Swinbank added a comment - Thanks Chris — nice work! However, I have left a bunch of comments on GitHub, of a variety of levels of pickiness. I guess the big issue I don't understand is why we convert everything from a Pandas DataFrame to dicts, then convert them all back again at the end of the calculations. This seems to be less than ideal for a few reasons: That conversion presumably isn't particularly fast; I'm guessing (without testing) that it would be more efficient for the calculations to be done column-wise on the DataFrame rather than by iterating over each DIAObject individually; The DataFrame has a well-structured data model (somewhat equivalent to the afw::table schema) which I think would actually be useful here (to avoid plugins clobbering each other's results, for example). I should add that I'm a total Pandas neophyte, so it might be that I've missed something obvious. Can you fill me in on the thinking? I think the answer to the above has a bearing on a bunch of my comments on GitHub — some of my comments there will become more or less important when I understand how Pandas fits in to the big picture — so it's probably not worth your while spending a lot of time on them until we've converged here.
            Show
            cmorrison Chris Morrison added a comment - - edited New Jenkins run after review:  https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/30413/pipeline
            Hide
            swinbank John Swinbank added a comment -

            Some minor suggestions on GitHub; otherwise, good to go. Thanks!

            Show
            swinbank John Swinbank added a comment - Some minor suggestions on GitHub; otherwise, good to go. Thanks!
            Hide
            swinbank John Swinbank added a comment -

            For the record — Eric Bellm, Chris Morrison and I briefly touched on my concerns re Pandas in the comment above. The consensus was that it is more important to get the basic functionality working than to worry about optimization on this ticket.

            However, I do think we should make another pass through this code soon, to look at whether we're really making best use of Pandas and if we could use it to e.g. reinstate the multi plugin type, improve performance, simplify the code, etc.

            Show
            swinbank John Swinbank added a comment - For the record —  Eric Bellm , Chris Morrison and I briefly touched on my concerns re Pandas in the comment above. The consensus was that it is more important to get the basic functionality working than to worry about optimization on this ticket. However, I do think we should make another pass through this code soon, to look at whether we're really making best use of Pandas and if we could use it to e.g. reinstate the multi plugin type, improve performance, simplify the code, etc.

              People

              • Assignee:
                cmorrison Chris Morrison
                Reporter:
                cmorrison Chris Morrison
                Reviewers:
                John Swinbank
                Watchers:
                Chris Morrison, Eric Bellm, John Swinbank
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel