Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-9979

Redesign afw::math::Statistics

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: afw
    • Labels:
      None
    • Story Points:
      10
    • Sprint:
      DRP S17-5, DRP S17-6
    • Team:
      Data Release Production

      Description

      The Statistics class predates many of the classes that we now use with it, and its design has slowly become a pain point that we tend to work around rather than fix. It's time for a redesign.

      One centerpiece of that redesign should probably be DM-2415: use ndarray instead of the image classes as the primary data class Statistics interacts with. That'd let us clean up a lot of helper classes (since it's easy to get an ndarray::Array from an Image, but hard to do the reverse in general). I'd also like to see Statistics interoperate with the new SpanSet functor interface somehow, which would let us compute statistics on a set of pixels whether those pixels are in an Image, a 2-d array, or a flattened 1-d array from e.g. a HeavyFootprint. And it'd be nice to address DM-3156 (support using a column from a Catalog even when that catalog is not contiguous) as well.

      We should also update how Statistics is configured: since the Statistics was written, we've switched to a new configuration package (pex_config; yes, at one point it was new) that brings with it a new convention for configuring C++ objects. We've also made it possible to use keywords arguments in Python interfaces, which may make some configuration interfaces palatable that were not before.

      Statistics also plays an important role in coaddition, because we use it to compute the value of each coadd pixel from the warped CCD-level pixels at the same position. There are currently some problems with this captured in DM-4158 (provide more options for statistics to use in coaddition) and DM-9953 (we need more control over how we use the mask values of input pixels to set the mask values over output pixels). So this redesign will also include redesigning (to perhaps a lesser degree) the afw::math::statisticsStack functions that do this work.

      Beyond those requirements, we of course need to keep at least most of the functionality of the current design; if you see something you think is vestigial, please check before removing it.

      I do not think we want to be constrained by the current interface in any way - as long as the same functionality is supported somehow, breaking backwards compatibility is acceptable (and expected).

      This ticket is to more concretely gather requirements from features of the current codebase and the above feature requests, and to produce an initial design sketch to meet those features. Implementation should be on one or more additional tickets.

      Robert Lupton, Russell Owen, Simon Krughoff: if you can think of any other tickets (or unticketed problems) that would put requirements on a Statistics design, please comment here.

        Attachments

          Issue Links

            Activity

            Hide
            jbosch Jim Bosch added a comment -

            Pim Schellart [X], I assume you're recommending I look at the DM-9979 branch of DMTN-043?

            Show
            jbosch Jim Bosch added a comment - Pim Schellart [X] , I assume you're recommending I look at the DM-9979 branch of DMTN-043?
            Hide
            pschella Pim Schellart [X] (Inactive) added a comment - - edited

            Jim Bosch, yes. The user branch is just the code I used for experimenting and benchmarking.
            You may find that useful to look at too though, just don't take it too seriously.

            Show
            pschella Pim Schellart [X] (Inactive) added a comment - - edited Jim Bosch , yes. The user branch is just the code I used for experimenting and benchmarking. You may find that useful to look at too though, just don't take it too seriously.
            Hide
            jbosch Jim Bosch added a comment -

            Lots of little comments here. Big-picture issues were discussed in person - summary is that this is a reasonable approach, and it'll probably be impossible to come up with a design we all love because it's a case where we really do want to do compile-time logic and that's always at least a bit ugly in C++.

            • When called with afw::image arguments, I think it makes sense to use an afw::image::Image for the weight argument as well (or at least a 2-d array rather than a 1-d array).
            • I believe the only reason afw::math::MaskedVector exists is for interoperability with statistics, and its implementation is a bit of a hack. I'd be quite happy to see it removed entirely as part of this redesign, or at least replaced with a simple struct or near-struct that just holds three 1-d ndarray::Arrays.
            • I assume Result uses NaN to represent values not computed?
            • Our conventions say Result's attributes should be available as both properties and methods in Python.
            • Result needs to be renamed - perhaps it should be called Statistics, or at least StatisticsResult?
            • The public standardStatistics function also needs to be renamed to start with a verb, perhaps to computeStatistics.
            • I'm a bit nervous about using additional overloads/specializations of the public function to implement the selection chain that converts the runtime arguments into template parameters. Could we move that down into a completely private function?
            • I think we want the core design to involve a Result object that stores all possible outputs (e.g. clipped and unclipped versions of the same statistics at the same time), but I'd like to add some convenience wrappers to make it easy to get e.g. clipped and unclipped versions of a particular statistic through the same output interface.
            Show
            jbosch Jim Bosch added a comment - Lots of little comments here. Big-picture issues were discussed in person - summary is that this is a reasonable approach, and it'll probably be impossible to come up with a design we all love because it's a case where we really do want to do compile-time logic and that's always at least a bit ugly in C++. When called with afw::image arguments, I think it makes sense to use an afw::image::Image for the weight argument as well (or at least a 2-d array rather than a 1-d array). I believe the only reason afw::math::MaskedVector exists is for interoperability with statistics, and its implementation is a bit of a hack. I'd be quite happy to see it removed entirely as part of this redesign, or at least replaced with a simple struct or near-struct that just holds three 1-d ndarray::Arrays . I assume Result uses NaN to represent values not computed? Our conventions say Result 's attributes should be available as both properties and methods in Python. Result needs to be renamed - perhaps it should be called Statistics , or at least StatisticsResult ? The public standardStatistics function also needs to be renamed to start with a verb, perhaps to computeStatistics . I'm a bit nervous about using additional overloads/specializations of the public function to implement the selection chain that converts the runtime arguments into template parameters. Could we move that down into a completely private function? I think we want the core design to involve a Result object that stores all possible outputs (e.g. clipped and unclipped versions of the same statistics at the same time), but I'd like to add some convenience wrappers to make it easy to get e.g. clipped and unclipped versions of a particular statistic through the same output interface.
            Hide
            swinbank John Swinbank added a comment -

            Before closing this, please ping Jonathan Sick to make sure it gets published to dmtn-043.lsst.io.

            Show
            swinbank John Swinbank added a comment - Before closing this, please ping Jonathan Sick to make sure it gets published to dmtn-043.lsst.io.
            Hide
            pschella Pim Schellart [X] (Inactive) added a comment -

            Merged and available at https://dmtn-043.lsst.io

            Show
            pschella Pim Schellart [X] (Inactive) added a comment - Merged and available at https://dmtn-043.lsst.io

              People

              • Assignee:
                pschella Pim Schellart [X] (Inactive)
                Reporter:
                jbosch Jim Bosch
                Reviewers:
                Jim Bosch
                Watchers:
                Jim Bosch, John Parejko, John Swinbank, Krzysztof Findeisen, Pim Schellart [X] (Inactive), Simon Krughoff
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel