# Experiment with ProcessPoolExecutor for reading/writing jointcal results

XMLWordPrintable

#### Details

• Type: Story
• Status: Invalid
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
• Story Points:
4
• Team:

#### Description

I just experimented with using multiple threads/processes to accelerate writing jointcal's output. Fortunately, concurrent.futures is very easy to use. Unfortunately, the ProcessPoolExecutor doesn't work because there are unpickleable objects. ThreadPoolExecutor worked just fine, with the tests passing, but it wasn't any faster.

Once the pybind11 port is done, I should give this another try, including adding the necessary pybind11 code to pickle the things that need to be pickled.

Below is the rewrite of _write_results(), so I don't forget. _write_one_result() just contains the inside of the loop, using the new "self" objects. It would be worth considering what really should be in jointcal's "self" as part of this.

  self.astrom_model = astrom_model  self.photom_model = photom_model  self.visit_ccd_to_dataRef = visit_ccd_to_dataRef    import concurrent.futures  ccdImageList = associations.getCcdImageList()  with concurrent.futures.ProcessPoolExecutor() as executor:  executor.map(self._write_one_result, ccdImageList) 

#### Activity

Hide
John Parejko added a comment -

Helpful comments from Paul Price about how to figure out what needs pickling here:

Try with lsst.ctrl.pool.pool.pickleSniffer(): doSomethingWithPickle(), or decorate the function with @lsst.ctrl.pool.pool.catchPicklingError.

Show
John Parejko added a comment - Helpful comments from Paul Price about how to figure out what needs pickling here: Try with lsst.ctrl.pool.pool.pickleSniffer(): doSomethingWithPickle(), or decorate the function with @lsst.ctrl.pool.pool.catchPicklingError.
Hide
John Parejko added a comment -

Relatedly, it would be worth trying to make _build_ccdImage thread/process safe (possibly by doing AddImage as a separate step?), and trying the same thing with it. Other than AddImage, I think that loop is trivially parallel.

Show
John Parejko added a comment - Relatedly, it would be worth trying to make _build_ccdImage thread/process safe (possibly by doing AddImage as a separate step?), and trying the same thing with it. Other than AddImage, I think that loop is trivially parallel.
Hide
John Parejko added a comment -

Here's one possibility for the above, though it's not great, with _build_ccdImage returning a tuple of the arguments to AddImage in addition to the Result namedtuple:

  with pipeBase.cmdLineTask.profile(load_cat_prof_file):  import concurrent.futures  with concurrent.futures.ProcessPoolExecutor() as executor:  mapped = executor.map(self._build_ccdImage, dataRefs)  for (stuff, result), ref in zip(mapped, dataRefs):  associations.AddImage(*stuff, jointcalControl)  oldWcsList.append(result.wcs)  visit_ccd_to_dataRef[result.key] = ref 

This doesn't work as written, but it did work with a ThreadPoolExecutor(), so there's hope, once I sort out the pickling.

Show
John Parejko added a comment - Here's one possibility for the above, though it's not great, with _build_ccdImage returning a tuple of the arguments to AddImage in addition to the Result namedtuple: with pipeBase.cmdLineTask.profile(load_cat_prof_file): import concurrent.futures with concurrent.futures.ProcessPoolExecutor() as executor: mapped = executor.map(self._build_ccdImage, dataRefs) for (stuff, result), ref in zip(mapped, dataRefs): associations.AddImage(*stuff, jointcalControl) oldWcsList.append(result.wcs) visit_ccd_to_dataRef[result.key] = ref This doesn't work as written, but it did work with a ThreadPoolExecutor() , so there's hope, once I sort out the pickling.
Hide
John Parejko added a comment -

I've pushed a branch with some more attempts, including trying to pickle ccdImage and a cleaned up version of the above addImage code. The reading .map() fails with an error about __init__ arguments, but doesn't specify what init is failing, while the writing fails with a RuntimeError: make_tuple(): unable to convert arguments of types 'std::tuple<object, ... object>' to Python object error. Getting the ccdImage pickle to work might not be necessary to get the read parallelization to work, but on the other hand parallel reading may require being able to pickle a whole bunch of afw objects: all the metadata.

Show
John Parejko added a comment - I've pushed a branch with some more attempts, including trying to pickle ccdImage and a cleaned up version of the above addImage code. The reading .map() fails with an error about __init__ arguments, but doesn't specify what init is failing, while the writing fails with a RuntimeError: make_tuple(): unable to convert arguments of types 'std::tuple<object, ... object>' to Python object error. Getting the ccdImage pickle to work might not be necessary to get the read parallelization to work, but on the other hand parallel reading may require being able to pickle a whole bunch of afw objects: all the metadata.
Hide
John Parejko added a comment -

Gen3 jointcal reads and writes aggregated visit-level SourceCatalogs and ExposureCatalogs, so this approach is no longer necessary.

Show
John Parejko added a comment - Gen3 jointcal reads and writes aggregated visit-level SourceCatalogs and ExposureCatalogs, so this approach is no longer necessary.

#### People

Assignee:
John Parejko
Reporter:
John Parejko
Watchers:
John Parejko, Paul Price, Russell Owen, Simon Krughoff