是什么在python 2.6中导致僵尸PoolWorkers?

时间:2019-05-15 21:21:16

标签: python exception python-multiprocessing python-2.6 zombie-process

使用Python2.6:在我的多处理实现中,某些工作进程在成功处理所有列出的文件之前就变成了僵尸。在大多数情况下,这是无害的,只是减慢了处理速度,因为其余工人可以完成任务。但是有时候所有的工作人员都变成僵尸,僵尸脚本停止运行并停止进一步的目录迭代。

我要遍历目录列表中的文件列表,一次访问一个目录,并且正在使用多处理模块来减少处理时间。但是,有时,由于我不负责的另一个程序中的复杂性,无法处理特定文件。为了解决这个问题,我添加了一个TimeoutException类,以将失败的文件放回到队列中,如果它们在特定时间内没有完成,则由另一个工作程序重新处理。

def f_init(q):
  processMethod.q = q

class TimeoutException(Exception):
  pass

def handler(signum, frame):
  raise TimeoutException()

def processMethod(f):
  limit = 84
  try:
    signal.signal(signal.SIGALRM, handler)
    signal.alarm(240)

    {data processing}

    newfiles = len(glob.glob("*" + "fdate.jpg"))
    if newfiles < limit:
      time.sleep(240)

    return 1

  except TimeoutException:
    processMethod.q.put(f)

    return None

def main(directory)
  total_items = len(directory)
  successful = []
  failure_tracker = []

  q = Queue()
  p = Pool(15, f_init, [q])
  results = p.imap(processMethod, directory)
  retry_results = []

  while len(successful) < total_items:
    successful.extend([r for r in results if not r is None])
    successful.extend([r for r in retry_results if not r is None])
    failed_items = []
    while not q.empty():
      failed_items.append(q.get())
    if failed_items:
      failure_tracker.append(failed_items)
      retry_results = p.imap(processMethod, failed_items)
  p.close()
  p.join()

  return

if __name__ == "__main__":
  directory = os.listdir("/sourcedir") 

  main(directory)

我不明白是什么原因导致了错误。我希望,如果任何过程花费的时间超过240秒,它将被踢回到main()并将文件添加到“ failed_items”。到目前为止,所有失败的文件都已得到正确处理,但辅助进程有时仍挂起。以下是在给定工作进程被僵化的情况下向终端输出的回溯示例:

Process PoolWorker-1:
Traceback (most recent call last):
  File "/usr/lib64/python2.6/multiprocessing/process.py", line 232, in _bootstrap
    self.run()
  File "/usr/lib64/python2.6/multiprocessing/process.py", line 88, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib64/python2.6/multiprocessing/pool.py", line 57, in worker
    task = get()
  File "/usr/lib64/python2.6/multiprocessing/queues.py", line 350, in get
    racquire()
  File "/my/home/dir/myScript.py", line 47, in handler
    raise TimeoutException()
TimeoutException

有时回溯会稍有不同:

Process PoolWorker-1:
Traceback (most recent call last):
  File "/usr/lib64/python2.6/multiprocessing/process.py", line 232, in _bootstrap
    self.run()
  File "/usr/lib64/python2.6/multiprocessing/process.py", line 88, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib64/python2.6/multiprocessing/pool.py", line 57, in worker
    task = get()
File "/usr/lib64/python2.6/multiprocessing/queues.py", line 352, in get
    return recv()
File "/my/home/dir/myScript.py", line 47, in handler
    raise TimeoutException()
TimeoutException

这是引发TimeoutException的问题还是与池/队列本身有关的问题?由于悬挂过程的零星性质,我感到非常困惑。

0 个答案:

没有答案