I just experimented with using multiple threads/processes to accelerate writing jointcal's output. Fortunately, concurrent.futures is very easy to use. Unfortunately, the ProcessPoolExecutor doesn't work because there are unpickleable objects. ThreadPoolExecutor worked just fine, with the tests passing, but it wasn't any faster.
Once the pybind11 port is done, I should give this another try, including adding the necessary pybind11 code to pickle the things that need to be pickled.
Below is the rewrite of _write_results(), so I don't forget. _write_one_result() just contains the inside of the loop, using the new "self" objects. It would be worth considering what really should be in jointcal's "self" as part of this.