# Develop PipelineTask unit test framework

XMLWordPrintable

## Details

• Type: Story
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
• Story Points:
8
• Sprint:
AP S20-2 (January), AP S20-3 (February), AP S20-4 (March)
• Team:

## Description

Most of our new pipeline tasks' functionality can be tested by writing unit tests against run or more specific methods (in theory, these tests should be identical to those from the Gen 2 era). However, such tests do not verify:

• whether a task's Connections are correctly written and whether they match the inputs and outputs of the run method
• any logic in a custom runQuantum method
• configuration logic, such as optional or alternative inputs or outputs

Since the Gen 3 API is unfamiliar to us, these aspects of a PipelineTask are the ones that are most likely to have bugs.

Currently, the only way to test these features is in large-scale runs on Gen 3 repositories (e.g., HSC). Such tests, while valuable, can only exercise a small subset of conditions (e.g., configs), can be expensive to debug (e.g., due to cascading failures), and do not protect against regressions (no CI). A pytest-compatible framework that lets us test those parts of a PipelineTask that lie outside run will let us catch problems much faster.

As part of DM-21875, I created a prototype test framework for direct Butler I/O and used it to verify that datasets could be stored to and retrieved from a dummy, obs-agnostic repository. I believe the same approach can be used to test PipelineTask functionality without the need to simulate a "realistic" Butler or depend on obs packages.

Desired features:

• a natural way for the test author to provide mock data IDs for the repository. The appropriate IDs will depend on the task being tested. It should be possible to simplify this from the prototype code, since most of the complexity of the Gen 3 Dimensions system is not needed for most tests; an exception may be ImageDifferenceTask's mix of detector-level and patch-level inputs.
• a simple activator that calls runQuantum without modifications other than mocking run
• a way to test that the desired inputs get passed to run, including self-consistent use of config flags and templates. This will probably involve mocking run and may involve mock datasets, which are more technically challenging.
• a way to verify the output of a (real) run call against a configured connections object
• analogous support for __init__ inputs and outputs, which I'm less familiar with

## Attachments

1. daf_butler_docs.tar.gz
563 kB
2. pipe_base_docs.tar.gz
352 kB

## Activity

Hide
Krzysztof Findeisen added a comment - - edited

The goal was to make the more pipeliney parts of a PipelineTask unit-testable. I guess in terms of your questions it would be mostly "make sure it's valid before it runs", with a bit of "I think I got my connections straight from my quanta, but I'm not sure". I think you could use it to create a PipelineTask from scratch if you're doing test-driven development (i.e., write tests for a feature, code until they pass, then repeat), but that wasn't the approach I had in mind.

I was hoping test_pipelineTaskTests.py (yes, it's a terrible name, please suggest a better one) would serve as an example of what you could test with the framework, except on a real task instead of the nonsense VisitTask and PatchTask. I take it it's not a very good example?

Show
Hide
Meredith Rawls added a comment -

Thanks, that makes the whole goal much clearer! I think I got bogged down in the tests because of the setup code duplication; it felt like you were plowing through each logical branch of the new functionality (which is good!) and not really providing useful examples. But then, I am several steps away from "let's write gen3 tests" in my day-to-day workflow, so I suspect you have provided useful examples and they're just not immediately relevant for me.

Name-wise, all I've got is test_pipeBaseTestUtils ... and that's not much better. I suppose we can't exactly call it "look! examples for how to write gen3 PipelineTask unit tests!" Hmmm. test_pipelineTaskUtils ? At least it only has "test" in it once.

Show
Meredith Rawls added a comment - Thanks, that makes the whole goal much clearer! I think I got bogged down in the tests because of the setup code duplication; it felt like you were plowing through each logical branch of the new functionality (which is good!) and not really providing useful examples. But then, I am several steps away from "let's write gen3 tests" in my day-to-day workflow, so I suspect you have provided useful examples and they're just not immediately relevant for me. Name-wise, all I've got is test_pipeBaseTestUtils ... and that's not much better. I suppose we can't exactly call it "look! examples for how to write gen3 PipelineTask unit tests!" Hmmm. test_pipelineTaskUtils  ? At least it only has "test" in it once.
Hide
Krzysztof Findeisen added a comment - - edited

Ok, I've added documentation for both pipe_base and daf_butler (can't ping Meredith Rawls on the latter because of technical difficulties with GitHub). I haven't renamed runQuantum yet because I can't think of a name I think is a lesser evil; this might be a good point to revisit in the context of the user guide (which needs to talk about both runQuantums, and yes it's confusing).

Built versions of both sets of documentation attached to this issue.

Show
Krzysztof Findeisen added a comment - - edited Ok, I've added documentation for both pipe_base and daf_butler (can't ping Meredith Rawls on the latter because of technical difficulties with GitHub). I haven't renamed runQuantum yet because I can't think of a name I think is a lesser evil; this might be a good point to revisit in the context of the user guide (which needs to talk about both runQuantums , and yes it's confusing). Built versions of both sets of documentation attached to this issue.
Hide
Meredith Rawls added a comment -

The docs are quite clear, thank you for your work on this! The little toy examples are particularly useful. Besides the minor GitHub comments, I think the only thing that remains is renaming runQuantum, because it's confusing to have an overloaded method so-to-speak. The best I can think of is runPseudoQuantum? Even runTestQuantum would be better. Just anything to differentiate it from "normal" runQuantum. Assuming you can come up with something reasonable there and implement it, I think this is good to go.

Show
Meredith Rawls added a comment - The docs are quite clear, thank you for your work on this! The little toy examples are particularly useful. Besides the minor GitHub comments, I think the only thing that remains is renaming runQuantum, because it's confusing to have an overloaded method so-to-speak. The best I can think of is runPseudoQuantum? Even runTestQuantum would be better. Just anything to differentiate it from "normal" runQuantum. Assuming you can come up with something reasonable there and implement it, I think this is good to go.
Hide
Krzysztof Findeisen added a comment -

Ok, I'll go with runTestQuantum then. Thanks for the feedback!

Show
Krzysztof Findeisen added a comment - Ok, I'll go with runTestQuantum then. Thanks for the feedback!

## People

• Assignee:
Krzysztof Findeisen
Reporter:
Krzysztof Findeisen
Reviewers:
Meredith Rawls
Watchers:
John Parejko, John Swinbank, Krzysztof Findeisen, Meredith Rawls