Status: Won't Fix
Fix Version/s: None
While it is important to use random seeds to prevent a strange random value from causing a test to fail, a handful of tests seem to depend to heavily on the specific seed they are given. This ticket should act as both a log for tests that may be too sensitive to the seed value and eventually corrections to those tests.
Also, afw/tests/testChebyshevBoundedField.py. See
DM-7461. Setting the random seed can change the results of testEvaluate significantly enough for the difference to exceed the tolerance by a factor of 2 and also to be sensitive to MKL vs no MKL.
There's a test in pipe_tasks that can fail because of random number sensitivity.
I'm unsure of where to go with this ticket.
I've confirmed that I can cause both test_apCorrMap.py and test_dipole.py to fail by changing the seed (although it took me several attempts to find a seed which fails). I didn't manage to get a failure in test_chebyshevBoundedField.py, but I'm prepared to believe I would if I kept playing with the seed for long enough.
But... is this actually a problem? In all of those cases, we could avoid the problem by loosening the test tolerance. Would that be generally useful? It's not obvious to me that it would.
I'm happy to hear thoughts, but I'm inclined to close this as “won't fix”, and invite folks to file bugs against specific tests describing exactly what they want changed.
I do worry that the random seed variation is not understood and so we aren't entirely sure what the right number is for the tolerance of each test. Ideally I'd like to know the distribution of answers as we change the seed and determine from that distribution whether the answers are all acceptable or whether it's telling us that the algorithm itself is unstable and will cause us grief later on. I understand that this is a lot of work though so I'm not going to block a won't fix.
I'm nervous about the idea that we can post-facto assess the scientific validity of algorithms based on unit test outputs.
If the test were carefully written together with the algorithm with an eye to being used for this purpose, I think it'd be a great idea. Where that wasn't the case, though, I think it'd be a lot of work for minimal gain — I'd rather treat these tests as effectively regression tests, and assess scientific validity of our algorithms by large-scale data processing campaigns.
So far the following tests have been noted to be sensitive to the random seed: