Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-17593

Investigate non-subtask “subtasks” in ip_diffim

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: ip_diffim
    • Labels:
      None
    • Story Points:
      8
    • Epic Link:
    • Sprint:
      AP S19-3
    • Team:
      Alert Production

      Description

      Per DM-12558, ip_diffim creates and uses Task objects without going through the regular subtask mechanism. Why? If this is necessary, we should document it and properly account for it on DM-12558. If it's not necessary, we should fix it by using makeSubtask so we can keep track of the execution time, etc, of these tasks in the regular way.

        Attachments

          Issue Links

            Activity

            Hide
            gkovacs Gabor Kovacs added a comment - - edited

            As of 2019-02-13, ImageDifferenceTask cmd line task follows the flowcharts ImageDifference_flowchart-Page-1.pdf ImageDifference_flowchart-Page-2.pdf .

            • The chart shows the main config options, most (all?) of them are task-level options. Many of these config options are also passed to subtasks as constructor arguments and they may not be defined (in the sense of pexConfig) as config options of the subtask itself.
            • Subtasks calls to run() and other entry points occur. There are cases of multiple calls to the same subtask instance.
            • The green boxes show calls to subtasks that are registered by makeSubtask at the ImageDifferenceTask construction.
            • The yellow boxes show subtask class instantiations on the spot.
            • The blue boxes indicate where algorithmic code or calls to them are inlined directly in the task level source code.
            • Empty boxes are classes within ip_diffim.
            • The structure suggests that the AL implementation initially followed a well structured subtasking, later pre- and post- subtraction functionality were added directly to the task code and these parts were not delegated to subtasks. This is particularly applies to the AL pre-subtraction processing and the common steps following (either the ZOGY or AL) subtraction.
            • The zogy implementation is a proper subtask but within its implementation similar on the spot subtask creation techniques are used.
            • The "retargeting" configuration feature of subtasks is used only for the getTemplate.
            • ZogyImagePsfMatchTask and ImagePsfMatchTask are accessed by the subtractAlgorithmRegistry as `zogy` and `al` respectively.

            The flowchart is currently stored in github lsst-dm/diffimTests/figure_subtasks.

            Show
            gkovacs Gabor Kovacs added a comment - - edited As of 2019-02-13, ImageDifferenceTask cmd line task follows the flowcharts ImageDifference_flowchart-Page-1.pdf ImageDifference_flowchart-Page-2.pdf . The chart shows the main config options, most (all?) of them are task-level options. Many of these config options are also passed to subtasks as constructor arguments and they may not be defined (in the sense of pexConfig) as config options of the subtask itself. Subtasks calls to run() and other entry points occur. There are cases of multiple calls to the same subtask instance. The green boxes show calls to subtasks that are registered by makeSubtask at the ImageDifferenceTask construction. The yellow boxes show subtask class instantiations on the spot. The blue boxes indicate where algorithmic code or calls to them are inlined directly in the task level source code. Empty boxes are classes within ip_diffim . The structure suggests that the AL implementation initially followed a well structured subtasking, later pre- and post- subtraction functionality were added directly to the task code and these parts were not delegated to subtasks. This is particularly applies to the AL pre-subtraction processing and the common steps following (either the ZOGY or AL) subtraction. The zogy implementation is a proper subtask but within its implementation similar on the spot subtask creation techniques are used. The "retargeting" configuration feature of subtasks is used only for the getTemplate . ZogyImagePsfMatchTask and ImagePsfMatchTask are accessed by the subtractAlgorithmRegistry as `zogy` and `al` respectively. The flowchart is currently stored in github lsst-dm/diffimTests/figure_subtasks.
            Hide
            gkovacs Gabor Kovacs added a comment -

            I think the investigation work is now complete with the flowchart and the comment. Could you please review the ticket?

            Show
            gkovacs Gabor Kovacs added a comment - I think the investigation work is now complete with the flowchart and the comment. Could you please review the ticket?
            Hide
            swinbank John Swinbank added a comment - - edited

            Thanks — I think this is a useful summary, but it would benefit from a conclusion!

            The request on DM-12558 was (roughly) that we should investigate all subtasks should be “proper” subtasks so that they can be timed in a consistent way. Based on what you've learned, what do you think? It seems like there could be at least three possible answers:

            • The code is of a sufficient level of complexity and awkwardness that it's not practical to address this without a major rewrite which could take several weeks or months.
            • Refactoring to use “proper” would be a significant effort, but it could be done given several days or a week. Whether we do it now is really a question as to how highly Eric Bellm prioritises timing all of the subtasks vs. getting to grips with other issues.
            • It's trivial — we can convert everything to use subtasks in a few minutes or hours. We should just do it while it's fresh in your mind.

            I suspect the true answer is somewhere between the first and second options above, and that means we should just move on to other things... but do you agree with me?

            Show
            swinbank John Swinbank added a comment - - edited Thanks — I think this is a useful summary, but it would benefit from a conclusion! The request on DM-12558 was (roughly) that we should investigate all subtasks should be “proper” subtasks so that they can be timed in a consistent way. Based on what you've learned, what do you think? It seems like there could be at least three possible answers: The code is of a sufficient level of complexity and awkwardness that it's not practical to address this without a major rewrite which could take several weeks or months. Refactoring to use “proper” would be a significant effort, but it could be done given several days or a week. Whether we do it now is really a question as to how highly Eric Bellm prioritises timing all of the subtasks vs. getting to grips with other issues. It's trivial — we can convert everything to use subtasks in a few minutes or hours. We should just do it while it's fresh in your mind. I suspect the true answer is somewhere between the first and second options above, and that means we should just move on to other things... but do you agree with me?
            Hide
            swinbank John Swinbank added a comment - - edited

            We discussed this in person on 2019-02-19. We concluded:

            • Refactoring is likely a non-trivial exercise;
            • There isn't a coherent set of retargetable subtasks to be extracted — tasks are mainly being used here as proxies for functions, rather than for their task-like qualities;
            • We expect timing ip_diffim to be important in the long run, but in the first instance we'd probably learn more by going in with a profiler than we would by attempting to time all of the subtasks (although the later will probably eventually be necessary).

            Therefore, we concluded that no further work here is necessary at the moment. We can address this when any other refactoring is being undertaken (e.g. to add Butler G3 support).

            Show
            swinbank John Swinbank added a comment - - edited We discussed this in person on 2019-02-19. We concluded: Refactoring is likely a non-trivial exercise; There isn't a coherent set of retargetable subtasks to be extracted — tasks are mainly being used here as proxies for functions, rather than for their task-like qualities; We expect timing ip_diffim to be important in the long run, but in the first instance we'd probably learn more by going in with a profiler than we would by attempting to time all of the subtasks (although the later will probably eventually be necessary). Therefore, we concluded that no further work here is necessary at the moment. We can address this when any other refactoring is being undertaken (e.g. to add Butler G3 support).

              People

              • Assignee:
                gkovacs Gabor Kovacs
                Reporter:
                swinbank John Swinbank
                Reviewers:
                John Swinbank
                Watchers:
                Gabor Kovacs, John Swinbank
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel