Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-9071

Experiment with ProcessPoolExecutor for reading/writing jointcal results

    Details

      Description

      I just experimented with using multiple threads/processes to accelerate writing jointcal's output. Fortunately, concurrent.futures is very easy to use. Unfortunately, the ProcessPoolExecutor doesn't work because there are unpickleable objects. ThreadPoolExecutor worked just fine, with the tests passing, but it wasn't any faster.

      Once the pybind11 port is done, I should give this another try, including adding the necessary pybind11 code to pickle the things that need to be pickled.

      Below is the rewrite of _write_results(), so I don't forget. _write_one_result() just contains the inside of the loop, using the new "self" objects. It would be worth considering what really should be in jointcal's "self" as part of this.

              self.astrom_model = astrom_model
              self.photom_model = photom_model
              self.visit_ccd_to_dataRef = visit_ccd_to_dataRef
       
              import concurrent.futures
              ccdImageList = associations.getCcdImageList()
              with concurrent.futures.ProcessPoolExecutor() as executor:
                  executor.map(self._write_one_result, ccdImageList)
      

        Attachments

          Issue Links

            Activity

            Hide
            Parejkoj John Parejko added a comment -

            Helpful comments from Paul Price about how to figure out what needs pickling here:

            Try `with lsst.ctrl.pool.pool.pickleSniffer(): doSomethingWithPickle()`, or decorate the function with `@lsst.ctrl.pool.pool.catchPicklingError`.

            Show
            Parejkoj John Parejko added a comment - Helpful comments from Paul Price about how to figure out what needs pickling here: Try `with lsst.ctrl.pool.pool.pickleSniffer(): doSomethingWithPickle()`, or decorate the function with `@lsst.ctrl.pool.pool.catchPicklingError`.
            Hide
            Parejkoj John Parejko added a comment -

            Relatedly, it would be worth trying to make _build_ccdImage thread/process safe (possibly by doing AddImage as a separate step?), and trying the same thing with it. Other than AddImage, I think that loop is trivially parallel.

            Show
            Parejkoj John Parejko added a comment - Relatedly, it would be worth trying to make _build_ccdImage thread/process safe (possibly by doing AddImage as a separate step?), and trying the same thing with it. Other than AddImage, I think that loop is trivially parallel.
            Hide
            Parejkoj John Parejko added a comment -

            Here's one possibility for the above, though it's not great, with _build_ccdImage returning a tuple of the arguments to AddImage in addition to the Result namedtuple:

                    with pipeBase.cmdLineTask.profile(load_cat_prof_file):
                        import concurrent.futures
                        with concurrent.futures.ProcessPoolExecutor() as executor:
                            mapped = executor.map(self._build_ccdImage, dataRefs)
                        for (stuff, result), ref in zip(mapped, dataRefs):
                            associations.AddImage(*stuff, jointcalControl)
                            oldWcsList.append(result.wcs)
                            visit_ccd_to_dataRef[result.key] = ref
            

            This doesn't work as written, but it did work with a ThreadPoolExecutor(), so there's hope, once I sort out the pickling.

            Show
            Parejkoj John Parejko added a comment - Here's one possibility for the above, though it's not great, with _build_ccdImage returning a tuple of the arguments to AddImage in addition to the Result namedtuple: with pipeBase.cmdLineTask.profile(load_cat_prof_file): import concurrent.futures with concurrent.futures.ProcessPoolExecutor() as executor: mapped = executor.map(self._build_ccdImage, dataRefs) for (stuff, result), ref in zip(mapped, dataRefs): associations.AddImage(*stuff, jointcalControl) oldWcsList.append(result.wcs) visit_ccd_to_dataRef[result.key] = ref This doesn't work as written, but it did work with a ThreadPoolExecutor() , so there's hope, once I sort out the pickling.
            Hide
            Parejkoj John Parejko added a comment -

            I've pushed a branch with some more attempts, including trying to pickle ccdImage and a cleaned up version of the above addImage code. The reading .map() fails with an error about __init__ arguments, but doesn't specify what init is failing, while the writing fails with a RuntimeError: make_tuple(): unable to convert arguments of types 'std::tuple<object, ... object>' to Python object error. Getting the ccdImage pickle to work might not be necessary to get the read parallelization to work, but on the other hand parallel reading may require being able to pickle a whole bunch of afw objects: all the metadata.

            Show
            Parejkoj John Parejko added a comment - I've pushed a branch with some more attempts, including trying to pickle ccdImage and a cleaned up version of the above addImage code. The reading .map() fails with an error about __init__ arguments, but doesn't specify what init is failing, while the writing fails with a RuntimeError: make_tuple(): unable to convert arguments of types 'std::tuple<object, ... object>' to Python object error. Getting the ccdImage pickle to work might not be necessary to get the read parallelization to work, but on the other hand parallel reading may require being able to pickle a whole bunch of afw objects: all the metadata.

              People

              • Assignee:
                Parejkoj John Parejko
                Reporter:
                Parejkoj John Parejko
                Watchers:
                John Parejko, Paul Price, Russell Owen, Simon Krughoff
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Summary Panel