Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-21937

Create new error-handling protocol for MetricTask

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: verify
    • Labels:

      Description

      Currently, running a MetricTask may either return a Measurement, return None, or raise an exception (preferably MetricComputationError). More details can be found in the documentation.

      Jim Bosch, in the DM-21885 discussion, said:

      I think we'll want to take a look at the expected-failure modes for MetricTasks you refer to in #3 as use cases for PipelineTasks in general, and define some rules that would allow them to work with generic activators. We've done a tiny bit of work in that area so far, but have long known that we need more sophistication in classifying and handling failures.

      Depending on how much these rules change the existing behavior, we may need to change the implementation of every concrete MetricTask, so story points are hard to estimate.

        Attachments

          Issue Links

            Activity

            Hide
            krzys Krzysztof Findeisen added a comment - - edited

            We had a brief discussion about this on November 1. The outcome was:

            • Failures in the metric algorithm itself, which raise MetricComputationError, will eventually be handled by the activators' default behavior (i.e., keep running everything that doesn't depend on the failed task). However, handling exceptions emitted by PipelineTasks is not a priority for Gen 3 migration, so for now MetricTask should provide a runQuantum method that does the same handling that MetricsControllerTask does in Gen 2 (with suppression of Butler.put instead of Job.write).
            • Silently handling metrics that don't apply to a particular pipeline configuration is not easy, and we did not come up with a good long-term solution to that. It may be possible to work around this for now, using the current convention (return None) and a custom runQuantum.

            Something that just occurred to me now is that there is at least one case for "not applicable" where it's difficult to determine in advance, within the Gen 3 framework, that a metric is inapplicable: timing of tasks or subtasks that have been turned off in configs. This is currently determined by checking, at run time, which keys get written to (top-level) task metadata. Since I'm not clear on how metadata will work in Gen 3, and since Jonathan Sick has expressed interest in special-casing performance metrics that could apply to every task, it might be best to delay solving the broader problem until we know how this case works.

            See also DM-20902, where a raise of MetricComputationError was demoted to a "not applicable" because ap_verify users were getting scared by a non-preventable "error" showing up for the first image of a given field of view.

            Show
            krzys Krzysztof Findeisen added a comment - - edited We had a brief discussion about this on November 1. The outcome was: Failures in the metric algorithm itself, which raise MetricComputationError , will eventually be handled by the activators' default behavior (i.e., keep running everything that doesn't depend on the failed task). However, handling exceptions emitted by PipelineTasks is not a priority for Gen 3 migration, so for now MetricTask should provide a runQuantum method that does the same handling that MetricsControllerTask does in Gen 2 (with suppression of Butler.put instead of Job.write ). Silently handling metrics that don't apply to a particular pipeline configuration is not easy, and we did not come up with a good long-term solution to that. It may be possible to work around this for now, using the current convention (return None ) and a custom runQuantum . Something that just occurred to me now is that there is at least one case for "not applicable" where it's difficult to determine in advance, within the Gen 3 framework, that a metric is inapplicable: timing of tasks or subtasks that have been turned off in configs. This is currently determined by checking, at run time, which keys get written to (top-level) task metadata. Since I'm not clear on how metadata will work in Gen 3, and since Jonathan Sick has expressed interest in special-casing performance metrics that could apply to every task, it might be best to delay solving the broader problem until we know how this case works. See also DM-20902 , where a raise of MetricComputationError was demoted to a "not applicable" because ap_verify users were getting scared by a non-preventable "error" showing up for the first image of a given field of view.
            Hide
            krzys Krzysztof Findeisen added a comment -

            Per discussion with Eric Bellm, we'd generally want aggregate metrics that have only partial inputs to still run, and work from the subset of the input data that's actually available.

            Show
            krzys Krzysztof Findeisen added a comment - Per discussion with Eric Bellm , we'd generally want aggregate metrics that have only partial inputs to still run, and work from the subset of the input data that's actually available.
            Hide
            krzys Krzysztof Findeisen added a comment -

            I'm un-blocking DM-21911 on this, since we have a workaround (MetricTask.runQuantum). I'm leaving the blocker on DM-21885 for now as a reminder to revisit this discussion as Gen 3 development moves forward.

            Show
            krzys Krzysztof Findeisen added a comment - I'm un-blocking DM-21911 on this, since we have a workaround ( MetricTask.runQuantum ). I'm leaving the blocker on DM-21885 for now as a reminder to revisit this discussion as Gen 3 development moves forward.
            Hide
            krzys Krzysztof Findeisen added a comment -

            Gen 3 development has moved forward, but it's now looking like special handling of edge cases is not going to be a priority for a long time. I'm marking this ticket as Done (by workaround), since there's unlikely to be a good time to do more on this front.

            Show
            krzys Krzysztof Findeisen added a comment - Gen 3 development has moved forward, but it's now looking like special handling of edge cases is not going to be a priority for a long time. I'm marking this ticket as Done (by workaround), since there's unlikely to be a good time to do more on this front.

              People

              • Assignee:
                krzys Krzysztof Findeisen
                Reporter:
                krzys Krzysztof Findeisen
                Watchers:
                Jim Bosch, Krzysztof Findeisen, Nate Lust
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: