Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-18416

Develop metric for assessing an image's contribution to DcrModel

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Story Points:
      12
    • Sprint:
      AP S19-5, AP F19-5 (October), AP F19-6 (November)
    • Team:
      Alert Production

      Description

      Current OpSim simulations suggest there are relatively few images per field in u & g in the first years of the LSST survey, which may make it challenging to construct usable DCR models.  This ticket is to develop an metric to characterize the relative value of a specific new image at a specific airmass and parallactic angle (given the past pointing history) in improving the DCR model and hence reducing the total number of DCR-induced false positives.

        Attachments

          Issue Links

            Activity

            Hide
            sullivan Ian Sullivan added a comment -

            The new notebook defines two different approaches for calculating a DCR metric. The metric can be used to evaluate whether a given set of observations will be sufficient to constrain the DCR model, and whether a new science observation will be well matched by a DCR model.

            I think the second, KDE-based, approach is what we should use, though the simple histogram approach has some advantages. As part of your review, please provide a recommendation of which we should move forward with, and any major features or modifications you would like to see.

            You can find the notebook here:
            https://github.com/lsst-dm/ap_pipe-notebooks/blob/tickets/DM-18416/DM-18416-DCR-metric-development.ipynb

            Show
            sullivan Ian Sullivan added a comment - The new notebook defines two different approaches for calculating a DCR metric. The metric can be used to evaluate whether a given set of observations will be sufficient to constrain the DCR model, and whether a new science observation will be well matched by a DCR model. I think the second, KDE-based, approach is what we should use, though the simple histogram approach has some advantages. As part of your review, please provide a recommendation of which we should move forward with, and any major features or modifications you would like to see. You can find the notebook here: https://github.com/lsst-dm/ap_pipe-notebooks/blob/tickets/DM-18416/DM-18416-DCR-metric-development.ipynb
            Hide
            ebellm Eric Bellm added a comment -

            Hi Ian Sullivan, nice work. A few comments:

            The "visit measure" idea is a good insight--reducing the problem to 1D makes it a lot easier to think about.

            Your two proposed solutions assume that the DCR model is best constrained by ~uniform sampling in visit measure. It's not obvious to me, though--isn't most of the information in the extreme values, where the shifts are the largest?

            If uniform sampling is the goal, rather than raw histograms, I wonder if a comparison of the empirical cumulative distribution function to the ideal (straight) CDF would be more effective. It could be bin-independent, I think. (Something like a K-S test, or some measure on a Q-Q plot.) I'm not sure if it would give a larger metric value with more data points, though, so perhaps it's inferior to your KDE approach. Think about it for a little bit, but you only need to try it if it seems like a useful step forward.

            Minor items:

            I would describe in words (or equations) in the notebook what the metrics you are calculating are, not just in code.

            Please add docstrings throughout your metric classes, to save future humans from determining what the arguments are.

            Show
            ebellm Eric Bellm added a comment - Hi Ian Sullivan , nice work. A few comments: The "visit measure" idea is a good insight--reducing the problem to 1D makes it a lot easier to think about. Your two proposed solutions assume that the DCR model is best constrained by ~uniform sampling in visit measure. It's not obvious to me, though--isn't most of the information in the extreme values, where the shifts are the largest? If uniform sampling is the goal, rather than raw histograms, I wonder if a comparison of the empirical cumulative distribution function to the ideal (straight) CDF would be more effective. It could be bin-independent, I think. (Something like a K-S test, or some measure on a Q-Q plot.) I'm not sure if it would give a larger metric value with more data points, though, so perhaps it's inferior to your KDE approach. Think about it for a little bit, but you only need to try it if it seems like a useful step forward. Minor items: I would describe in words (or equations) in the notebook what the metrics you are calculating are, not just in code. Please add docstrings throughout your metric classes, to save future humans from determining what the arguments are.
            Hide
            sullivan Ian Sullivan added a comment -

            You are right that the observations with the largest DCR constrain the model the most, but not if that's all you have. In general, a new observation constrains the model most if it samples a new region of "visit measure" space, and new observations at high airmass tend to have fewer (or no) observations near them. I've added a new plot to the notebook that illustrates how new observations with different visit measures would contribute to constraining the model. I've also added more description about the metric, and added docstrings throughout.

            Show
            sullivan Ian Sullivan added a comment - You are right that the observations with the largest DCR constrain the model the most, but not if that's all you have. In general, a new observation constrains the model most if it samples a new region of "visit measure" space, and new observations at high airmass tend to have fewer (or no) observations near them. I've added a new plot to the notebook that illustrates how new observations with different visit measures would contribute to constraining the model. I've also added more description about the metric, and added docstrings throughout.
            Show
            sullivan Ian Sullivan added a comment - Final notebook is here:  https://github.com/lsst-dm/ap_pipe-notebooks/blob/master/DM-18416-DCR-metric-development.ipynb

              People

              • Assignee:
                sullivan Ian Sullivan
                Reporter:
                ebellm Eric Bellm
                Reviewers:
                Eric Bellm
                Watchers:
                Eric Bellm, Ian Sullivan
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel