Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-26526

Can't run RawIngestTask with processes != 1

    XMLWordPrintable

    Details

      Description

      Attempting to run RawIngestTask with multiple processes on DM-25806, after fixing a pickling problem with Gen3DatasetIngestTask, produces the following stack trace:

      Process ForkPoolWorker-1:
      Traceback (most recent call last):
        File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
          self.run()
        File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/multiprocessing/process.py", line 99, in run
          self._target(*self._args, **self._kwargs)
        File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/multiprocessing/pool.py", line 110, in worker
          task = get()
        File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/multiprocessing/queues.py", line 354, in get
          return _ForkingPickler.loads(res)
      TypeError: __init__() takes from 1 to 2 positional arguments but 5 were given
      Process ForkPoolWorker-2:
      Traceback (most recent call last):
        File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
          self.run()
        File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/multiprocessing/process.py", line 99, in run
          self._target(*self._args, **self._kwargs)
        File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/multiprocessing/pool.py", line 110, in worker
          task = get()
        File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/multiprocessing/queues.py", line 354, in get
          return _ForkingPickler.loads(res)
      TypeError: __init__() takes from 1 to 2 positional arguments but 5 were given
      Process ForkPoolWorker-3:
      Traceback (most recent call last):
        File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
          self.run()
        File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/multiprocessing/process.py", line 99, in run
          self._target(*self._args, **self._kwargs)
        File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/multiprocessing/pool.py", line 110, in worker
          task = get()
        File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/multiprocessing/queues.py", line 354, in get
          return _ForkingPickler.loads(res)
      TypeError: __init__() takes from 1 to 2 positional arguments but 5 were given
      ^CProcess ForkPoolWorker-4:
      Process ForkPoolWorker-5:
      Traceback (most recent call last):
        File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/multiprocessing/pool.py", line 733, in next
          item = self._items.popleft()
      IndexError: pop from an empty deque
       
      During handling of the above exception, another exception occurred:
       
      Traceback (most recent call last):
        File "/scratch/krzys001/ap_verify/bin/ingest_dataset.py", line 29, in <module>
          result = runIngestion()
        File "/scratch/krzys001/ap_verify/python/lsst/ap/verify/ap_verify.py", line 238, in runIngestion
          ingestDatasetGen3(args.dataset, workspace, processes=args.processes)
        File "/scratch/krzys001/ap_verify/python/lsst/ap/verify/ingestion.py", line 628, in ingestDatasetGen3
          ingester.run(processes=processes)
        File "/scratch/krzys001/ap_verify/python/lsst/ap/verify/ingestion.py", line 479, in run
          self._ensureRaws(processes=processes)
        File "/scratch/krzys001/ap_verify/python/lsst/ap/verify/ingestion.py", line 511, in _ensureRaws
      Traceback (most recent call last):
        File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
          self.run()
        File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/multiprocessing/process.py", line 99, in run
          self._target(*self._args, **self._kwargs)
        File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/multiprocessing/pool.py", line 110, in worker
          task = get()
        File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/multiprocessing/queues.py", line 351, in get
          with self._rlock:
        File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/multiprocessing/synchronize.py", line 95, in __enter__
          return self._semlock.__enter__()
          self._ingestRaws(dataFiles, processes=processes)
      KeyboardInterrupt
        File "/scratch/krzys001/ap_verify/python/lsst/ap/verify/ingestion.py", line 538, in _ingestRaws
      Traceback (most recent call last):
        File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
          self.run()
        File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/multiprocessing/process.py", line 99, in run
          self._target(*self._args, **self._kwargs)
        File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/multiprocessing/pool.py", line 110, in worker
          task = get()
        File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/multiprocessing/queues.py", line 352, in get
          res = self._reader.recv_bytes()
        File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/multiprocessing/connection.py", line 216, in recv_bytes
          buf = self._recv_bytes(maxlength)
        File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
          buf = self._recv(4)
        File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/multiprocessing/connection.py", line 379, in _recv
          chunk = read(handle, remaining)
      KeyboardInterrupt
          self.ingester.run(dataFiles, run=None, processes=processes)
        File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/obs_base/20.0.0-36-g2de6156+58b4951e8a/python/lsst/obs/base/ingest.py", line 443, in run
          exposureData = self.prep(files, pool=pool, processes=processes)
        File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/obs_base/20.0.0-36-g2de6156+58b4951e8a/python/lsst/obs/base/ingest.py", line 361, in prep
          exposureData: List[RawExposureData] = self.groupByExposure(fileData)
        File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/obs_base/20.0.0-36-g2de6156+58b4951e8a/python/lsst/obs/base/ingest.py", line 280, in groupByExposure
          for f in files:
        File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/multiprocessing/pool.py", line 737, in next
          self._cond.wait(timeout)
        File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/threading.py", line 296, in wait
          waiter.acquire()
      

      Discussion on #dm-middleware revealed that this functionality has never been tested. Investigate this bug in a pure-obs_base context and fix the problem. Unit tests for pickling support of RawIngestTask and related classes are a good starting point.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              krzys Krzysztof Findeisen
              Reporter:
              krzys Krzysztof Findeisen
              Reviewers:
              Jim Bosch
              Watchers:
              Jim Bosch, Krzysztof Findeisen, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.