当我将num_workers
设置为大(如10)时,我的dask子进程由于某种原因没有终止。我的工作是运行在100多台核心机器上,并在50GB文件上运行类似字数的代码。 stacktrace看起来像这样:
Traceback (most recent call last):
Process PoolWorker-9:
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
Process PoolWorker-1:
Traceback (most recent call last):
Process PoolWorker-6:
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
Process PoolWorker-4:
Traceback (most recent call last):
Process PoolWorker-8:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
Process PoolWorker-2:
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
Traceback (most recent call last):
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self._target(*self._args, **self._kwargs)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker
Process PoolWorker-5:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
Process PoolWorker-10:
task = get()
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/queues.py", line 374, in get
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self.run()
self.run()
self._target(*self._args, **self._kwargs)
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker
self._target(*self._args, **self._kwargs)
self._target(*self._args, **self._kwargs)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker
racquire()
File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker
KeyboardInterrupt
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
task = get()
File "/usr/lib/python2.7/multiprocessing/queues.py", line 374, in get
self._target(*self._args, **self._kwargs)
task = get()
File "/usr/lib/python2.7/multiprocessing/queues.py", line 374, in get
File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker
task = get()
File "/usr/lib/python2.7/multiprocessing/queues.py", line 376, in get
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker
racquire()
KeyboardInterrupt
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
task = get()
File "/usr/lib/python2.7/multiprocessing/queues.py", line 374, in get
racquire()
KeyboardInterrupt
self._target(*self._args, **self._kwargs)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker
racquire()
return recv()
task = get()
File "/usr/lib/python2.7/multiprocessing/queues.py", line 374, in get
racquire()
KeyboardInterrupt
task = get()
File "/usr/lib/python2.7/multiprocessing/queues.py", line 374, in get
racquire()
KeyboardInterrupt
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
KeyboardInterrupt
KeyboardInterrupt
self.run()
self._target(*self._args, **self._kwargs)
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
self.run()
File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker
(类似于其他工人)
知道发生了什么事吗?在较小的输入(100MB)上运行的相同作业总是终止。