Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-9541

Bug related to MPI pickling when running coaddDriver

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: ctrl_pool
    • Labels:
      None

      Description

      As noted in DM-9447, the following error was encountered when running coaddDriver.py on the COSMOS HSC-Y subset of the HSC RC dataset (DM-6816):

      SystemError on tiger-r6c3n9:11271 in reduce: Negative size passed to PyString_FromStringAndSize
      Traceback (most recent call last):
        File "/tigress/HSC/LSST/stack_20160915/Linux64/ctrl_pool/12.1-7-gb57f33e+8/python/lsst/ctrl/pool/pool.py", line 113, in wrapper
          return func(*args, **kwargs)
        File "/tigress/HSC/LSST/stack_20160915/Linux64/ctrl_pool/12.1-7-gb57f33e+8/python/lsst/ctrl/pool/pool.py", line 237, in wrapper
          return func(*args, **kwargs)
        File "/tigress/HSC/LSST/stack_20160915/Linux64/ctrl_pool/12.1-7-gb57f33e+8/python/lsst/ctrl/pool/pool.py", line 747, in reduce
          results = self.comm.gather(None, root=self.root)
        File "MPI/Comm.pyx", line 1281, in mpi4py.MPI.Comm.gather (src/mpi4py.MPI.c:108949)
        File "MPI/msgpickle.pxi", line 664, in mpi4py.MPI.PyMPI_gather (src/mpi4py.MPI.c:47643)
        File "MPI/msgpickle.pxi", line 179, in mpi4py.MPI.Pickle.allocv (src/mpi4py.MPI.c:41800)
        File "MPI/msgpickle.pxi", line 127, in mpi4py.MPI.Pickle.alloc (src/mpi4py.MPI.c:40945)
      SystemError: Negative size passed to PyString_FromStringAndSize
      application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
      

      This same processing went through without error on the weekly stack of the first week of Jan, 2017 (I'm not sure of the exact version that was used).

        Attachments

          Issue Links

            Activity

            Hide
            hchiang2 Hsin-Fang Chiang added a comment -

            The code changes look fine to me, although I don't understand fully how pool works. I didn't run any test.

            Show
            hchiang2 Hsin-Fang Chiang added a comment - The code changes look fine to me, although I don't understand fully how pool works. I didn't run any test.
            Hide
            lauren Lauren MacArthur added a comment -

            I am running the test now (the original command where I bumped into this). All looks good so far. I'll post again once it has run to completion.

            Show
            lauren Lauren MacArthur added a comment - I am running the test now (the original command where I bumped into this). All looks good so far. I'll post again once it has run to completion.
            Hide
            price Paul Price added a comment -

            Lauren MacArthur, did the test work?

            Show
            price Paul Price added a comment - Lauren MacArthur , did the test work?
            Hide
            lauren Lauren MacArthur added a comment -

            Yes

            Show
            lauren Lauren MacArthur added a comment - Yes
            Hide
            price Paul Price added a comment -

            Awesome!

            Merged to master.

            Thanks, all!

            Show
            price Paul Price added a comment - Awesome! Merged to master. Thanks, all!

              People

              Assignee:
              price Paul Price
              Reporter:
              lauren Lauren MacArthur
              Reviewers:
              Hsin-Fang Chiang
              Watchers:
              Hsin-Fang Chiang, John Swinbank, Lauren MacArthur, Paul Price, Pim Schellart [X] (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.