Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-12389

Solicit feedback on DMTN-057

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Story Points:
      6
    • Sprint:
      Alert Production F17 - 11, AP S18-1
    • Team:
      Alert Production

      Description

      DMTN-057 has been written and is available at https://dmtn-057.lsst.io/. Its proposals should now be discussed by stakeholders in verify and its clients (ap_verify and jointcal) and the Tasks framework.

      This ticket may be closed once I've responded to all feedback and had it reviewed. Once DMTN-057 is in its final form, an RFD will be issued to discuss the best proposal (or hybrid thereof) to adopt.

        Attachments

          Issue Links

            Activity

            Hide
            krzys Krzysztof Findeisen added a comment -

            Hi Jim Bosch, please review the corrections I've made in response to everyone's feedback. In particular, please let me know if there's anything in the new dataset-driven architecture that won't work with the current Tasks framework or doesn't fit your intent. Thanks!

            Show
            krzys Krzysztof Findeisen added a comment - Hi Jim Bosch , please review the corrections I've made in response to everyone's feedback. In particular, please let me know if there's anything in the new dataset-driven architecture that won't work with the current Tasks framework or doesn't fit your intent. Thanks!
            Hide
            jbosch Jim Bosch added a comment - - edited

            Looks good! Your description the dataset-driven approach very much matches my (more vague) sense of how it would work, and you've accurately identified its strengths and weaknesses relative to the other approaches. I didn't see anything that wouldn't work with the current Task framework. Some minor comments follow (some of which may extend the conversation into RFD territory instead than suggest something to change in the tech note; I'm happy to follow your judgement on where to draw the line).

            On naming: while "measurement" is a heavily-overloaded term in our field, it - and in particular the name MeasurementTask - is very strongly associated in the codebase with source measurement. You could probably avoid some cognitive dissonance for some readers by replacing "MeasurementTask" with something else ("MetricReportingTask"?) here. (And this is out of scope, but I think we'll want to rename lsst.verify.Measurement itself before using it more broadly).

            Measurements may depend on information that is not present in the processed data. If this is the case, tasks can be passed a Job object for collecting measurements, which would then be persisted by a top-level task.

            My preferred way to deal with this would just be to add any information necessary for metrics to the task metadata, and thus ensure that it is present in the processed data. I think the fact that value is useful for a metric is an indication that it's the sort of thing we ought to include in the metadata anyway. I do think this means we'd need to start thinking of the metadata keys provided by Tasks (or subtask slots) as part of their documented public interface, but I think that's broadly true of any the metadata-based architectures you've proposed.

            However, it has trouble supporting metrics dealing with particular algorithms; in effect, one needs one framework for data-driven metrics and a separate system for internal metrics. This duality makes the system both harder to maintain and harder to develop new metrics for.

            Agreed; this is a good statement of the big problem with the dataset-driven architecture: it's bad at algorithmic-specific metrics.

            I think part of the mitigation is that we should prefer algorithmic-agnostic metrics whenever possible (after all, our requirements are algorithm-agnostic). But there's definitely a need for a separate mechanism for capturing algorithm-specific quantities in this architecture, and it ought to be something more tightly integrated into Tasks.

            It might be worth thinking about whether algorithm-specific quantities are consumed in a manner similar enough to standard metrics to be worth using the verify classes at all. I see these as being useful for regression testing and diagnostic/debugging work, but I don't immediately see a need to be able to define external targets for them, for instance.

            Show
            jbosch Jim Bosch added a comment - - edited Looks good! Your description the dataset-driven approach very much matches my (more vague) sense of how it would work, and you've accurately identified its strengths and weaknesses relative to the other approaches. I didn't see anything that wouldn't work with the current Task framework. Some minor comments follow (some of which may extend the conversation into RFD territory instead than suggest something to change in the tech note; I'm happy to follow your judgement on where to draw the line). On naming: while "measurement" is a heavily-overloaded term in our field, it - and in particular the name MeasurementTask - is very strongly associated in the codebase with source measurement. You could probably avoid some cognitive dissonance for some readers by replacing "MeasurementTask" with something else ("MetricReportingTask"?) here. (And this is out of scope, but I think we'll want to rename lsst.verify.Measurement itself before using it more broadly). Measurements may depend on information that is not present in the processed data. If this is the case, tasks can be passed a Job object for collecting measurements, which would then be persisted by a top-level task. My preferred way to deal with this would just be to add any information necessary for metrics to the task metadata, and thus ensure that it is present in the processed data. I think the fact that value is useful for a metric is an indication that it's the sort of thing we ought to include in the metadata anyway. I do think this means we'd need to start thinking of the metadata keys provided by Tasks (or subtask slots) as part of their documented public interface, but I think that's broadly true of any the metadata-based architectures you've proposed. However, it has trouble supporting metrics dealing with particular algorithms; in effect, one needs one framework for data-driven metrics and a separate system for internal metrics. This duality makes the system both harder to maintain and harder to develop new metrics for. Agreed; this is a good statement of the big problem with the dataset-driven architecture: it's bad at algorithmic-specific metrics. I think part of the mitigation is that we should prefer algorithmic-agnostic metrics whenever possible (after all, our requirements are algorithm-agnostic). But there's definitely a need for a separate mechanism for capturing algorithm-specific quantities in this architecture, and it ought to be something more tightly integrated into Tasks. It might be worth thinking about whether algorithm-specific quantities are consumed in a manner similar enough to standard metrics to be worth using the verify classes at all. I see these as being useful for regression testing and diagnostic/debugging work, but I don't immediately see a need to be able to define external targets for them, for instance.
            Hide
            krzys Krzysztof Findeisen added a comment - - edited

            You could probably avoid some cognitive dissonance for some readers by replacing "MeasurementTask" with something else ("MetricReportingTask"?) here. (And this is out of scope, but I think we'll want to rename lsst.verify.Measurement itself before using it more broadly).

            Yes, I've found the term "Measurement" to be confusing to other members of the AP group. MetricReportingTask sounds a bit odd (to me, "reporting" suggests SQuaSH upload); perhaps ComputeMetricsTask?

            My preferred way to deal with this would just be to add any information necessary for metrics to the task metadata.

            I can add mention of metadata as a possible solution, but I'm not convinced that it's the best one. In particular, metadata doesn't handle auxiliary information (notes and especially extras) very well.

            I do think this means we'd need to start thinking of the metadata keys provided by Tasks (or subtask slots) as part of their documented public interface, but I think that's broadly true of any the metadata-based architectures you've proposed.

            I'd go farther and say that metadata must be considered part of a task's API, whether or not we need it for metrics. It is impossible to do anything useful with metadata if you don't know what keys to expect and what each means.

            I think the fact that value is useful for a metric is an indication that it's the sort of thing we ought to include in the metadata anyway.

            I think part of the mitigation is that we should prefer algorithmic-agnostic metrics whenever possible (after all, our requirements are algorithm-agnostic)... It might be worth thinking about whether algorithm-specific quantities are consumed in a manner similar enough to standard metrics to be worth using the verify classes at all.

            I suspect this may be a fundamental mismatch in how the AP and DRP teams are thinking about metrics. While we will, of course, have specifications for metrics that correspond to official requirements, most of the metrics we're considering are motivated by their diagnostic value rather than a high-level requirement. From our point of view the main value of verify and SQuaSH is that it lets us monitor how the quality/performance of the code evolves as we work on it. For example, I could see the developer of a particular task adding new metrics as they identify new performance issues (e.g., a bottleneck in a specific implementation), essentially generalizing regression testing to quantities that don't have meaningful pass/fail thresholds.

            Do you think this is something that I should address more explicitly in the Introduction or Design Goals? I'm a bit worried that a misunderstanding about this could derail discussion between the AP/DRP groups.

            Show
            krzys Krzysztof Findeisen added a comment - - edited You could probably avoid some cognitive dissonance for some readers by replacing "MeasurementTask" with something else ("MetricReportingTask"?) here. (And this is out of scope, but I think we'll want to rename lsst.verify.Measurement itself before using it more broadly). Yes, I've found the term "Measurement" to be confusing to other members of the AP group. MetricReportingTask sounds a bit odd (to me, "reporting" suggests SQuaSH upload); perhaps ComputeMetricsTask ? My preferred way to deal with this would just be to add any information necessary for metrics to the task metadata. I can add mention of metadata as a possible solution, but I'm not convinced that it's the best one. In particular, metadata doesn't handle auxiliary information ( notes and especially extras ) very well. I do think this means we'd need to start thinking of the metadata keys provided by Tasks (or subtask slots) as part of their documented public interface, but I think that's broadly true of any the metadata-based architectures you've proposed. I'd go farther and say that metadata must be considered part of a task's API, whether or not we need it for metrics. It is impossible to do anything useful with metadata if you don't know what keys to expect and what each means. I think the fact that value is useful for a metric is an indication that it's the sort of thing we ought to include in the metadata anyway. I think part of the mitigation is that we should prefer algorithmic-agnostic metrics whenever possible (after all, our requirements are algorithm-agnostic)... It might be worth thinking about whether algorithm-specific quantities are consumed in a manner similar enough to standard metrics to be worth using the verify classes at all. I suspect this may be a fundamental mismatch in how the AP and DRP teams are thinking about metrics. While we will, of course, have specifications for metrics that correspond to official requirements, most of the metrics we're considering are motivated by their diagnostic value rather than a high-level requirement. From our point of view the main value of verify and SQuaSH is that it lets us monitor how the quality/performance of the code evolves as we work on it. For example, I could see the developer of a particular task adding new metrics as they identify new performance issues (e.g., a bottleneck in a specific implementation), essentially generalizing regression testing to quantities that don't have meaningful pass/fail thresholds. Do you think this is something that I should address more explicitly in the Introduction or Design Goals? I'm a bit worried that a misunderstanding about this could derail discussion between the AP/DRP groups.
            Hide
            jbosch Jim Bosch added a comment -

            MetricReportingTask sounds a bit odd (to me, "reporting" suggests SQuaSH upload); perhaps ComputeMetricsTask?

            I can add mention of metadata as a possible solution, but I'm not convinced that it's the best one. In particular, metadata doesn't handle auxiliary information (notes and especially extras) very well.

            Perhaps we should upgrade the metadata collection to enable it to store this sort of more structure information, then? (Essentially making a hybrid of the metadata and verify.Job, and using that for all metadata, I suppose).

            I'd go farther and say that metadata must be considered part of a task's API, whether or not we need it for metrics. It is impossible to do anything useful with metadata if you don't know what keys to expect and what each means.

            While we will, of course, have specifications for metrics that correspond to official requirements, most of the metrics we're considering are motivated by their diagnostic value rather than a high-level requirement.

            I think this probably just reflects my incomplete understanding of what verify is good at right now; I thought it at least strongly encouraged Metrics to be defined with targets. DRP certainly recognizes that there's a need for diagnostic reporting, and I think any difference between teams on this is probably just one of emphasis on metrics on specific algorithms (AP) vs. slots for algorithms (DRP). I think that's just a natural consequence of those two productions having very different challenges w.r.t. computational latency vs. algorithmic configurability, and probably not something to worry about in this discussion.

            Show
            jbosch Jim Bosch added a comment - MetricReportingTask sounds a bit odd (to me, "reporting" suggests SQuaSH upload); perhaps ComputeMetricsTask ? I can add mention of metadata as a possible solution, but I'm not convinced that it's the best one. In particular, metadata doesn't handle auxiliary information ( notes and especially extras ) very well. Perhaps we should upgrade the metadata collection to enable it to store this sort of more structure information, then? (Essentially making a hybrid of the metadata and verify.Job , and using that for all metadata, I suppose). I'd go farther and say that metadata must be considered part of a task's API, whether or not we need it for metrics. It is impossible to do anything useful with metadata if you don't know what keys to expect and what each means. While we will, of course, have specifications for metrics that correspond to official requirements, most of the metrics we're considering are motivated by their diagnostic value rather than a high-level requirement. I think this probably just reflects my incomplete understanding of what verify is good at right now; I thought it at least strongly encouraged Metrics to be defined with targets. DRP certainly recognizes that there's a need for diagnostic reporting, and I think any difference between teams on this is probably just one of emphasis on metrics on specific algorithms (AP) vs. slots for algorithms (DRP). I think that's just a natural consequence of those two productions having very different challenges w.r.t. computational latency vs. algorithmic configurability, and probably not something to worry about in this discussion.

              People

              • Assignee:
                krzys Krzysztof Findeisen
                Reporter:
                krzys Krzysztof Findeisen
                Reviewers:
                Jim Bosch
                Watchers:
                Jim Bosch, John Swinbank, Krzysztof Findeisen
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: