太多打开文件并行Python子进程错误

时间:2012-12-20 08:51:07

标签: python subprocess parallel-python

有类似问题的问题: Parallel Python - too many filesPython too many open files (subprocesses)

我正在使用Parallel Python [V1.6.2]来运行任务。该任务处理输入文件并输出日志/报告。比如,有10个文件夹,每个文件夹有5000~20000个文件,这些文件是并行读取,处理和记录的。每个文件大约50KB~250KB

运行约6小时后,并行Python失败并出现以下错误。

  File "/usr/local/lib/python2.7/dist-packages/pp-1.6.2-py2.7.egg/pp.py", line 342, in __init__
  File "/usr/local/lib/python2.7/dist-packages/pp-1.6.2-py2.7.egg/pp.py", line 506, in set_ncpus
  File "/usr/local/lib/python2.7/dist-packages/pp-1.6.2-py2.7.egg/pp.py", line 140, in __init__
  File "/usr/local/lib/python2.7/dist-packages/pp-1.6.2-py2.7.egg/pp.py", line 146, in start
  File "/usr/lib/python2.7/subprocess.py", line 679, in __init__
  File "/usr/lib/python2.7/subprocess.py", line 1135, in _execute_child
  File "/usr/lib/python2.7/subprocess.py", line 1091, in pipe_cloexec
OSError: [Errno 24] Too many open files
Error in sys.excepthook:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/apport_python_hook.py", line 66, in  apport_excepthook
ImportError: No module named fileutils

Original exception was:
Traceback (most recent call last):
  File "PARALLEL_TEST.py", line 746, in <module>
  File "/usr/local/lib/python2.7/dist-packages/pp-1.6.2-py2.7.egg/pp.py", line 342, in __init__
  File "/usr/local/lib/python2.7/dist-packages/pp-1.6.2-py2.7.egg/pp.py", line 506, in set_ncpus
  File "/usr/local/lib/python2.7/dist-packages/pp-1.6.2-py2.7.egg/pp.py", line 140, in __init__
  File "/usr/local/lib/python2.7/dist-packages/pp-1.6.2-py2.7.egg/pp.py", line 146, in start
  File "/usr/lib/python2.7/subprocess.py", line 679, in __init__
  File "/usr/lib/python2.7/subprocess.py", line 1135, in _execute_child
  File "/usr/lib/python2.7/subprocess.py", line 1091, in pipe_cloexec
OSError: [Errno 24] Too many open files

虽然我理解,这可能是这里指出的子进程中的问题http://bugs.python.org/issue2320,但是,似乎解决方案只是Py V3.2的一部分。我目前与Py V2.7绑定。

我想知道以下建议是否有帮助: [1] http://www.parallelpython.com/component/option,com_smf/Itemid,1/topic,313.0

*)在/usr/local/lib/python2.7/dist-packages/pp-1.6.2-py2.7.egg/pp.py <的destroy()方法中添加worker.t.close() / p>

*)在/usr/local/lib/python2.7/dist-packages/pp-1.6.2-py2.7.egg/ppauto.py中增加BROADCAST_INTERVAL

我想知道在Python V2.7中是否有针对此问题的修复/解决方法。

先谢谢

2 个答案:

答案 0 :(得分:1)

我的团队最近在运行celeryd任务队列作业时遇到了同样的文件句柄资源耗尽问题。我相信OP已经钉了它,它很可能是Python 2.7和Python 3.1中suprocess.py lib中的混乱代码。

根据Python Bug#2320的建议,请在调用subprocess.Popen()的任何地方传入close_fds=True。事实上,他们在Python 3.2中将其作为默认值,同时还修复了潜在的竞争条件问题。查看该票证的更多细节。

答案 1 :(得分:0)

我已离开某些线路以销毁作业服务器。 job_server.destroy()修复了这个问题。