Python多处理在没有打开文件时声称打开了太多文件

时间:2016-04-13 11:21:18

标签: python ctypes python-multiprocessing

我试图加快使用巨大矩阵的算法。我将它并行化以便对行进行操作,并将数据矩阵放在共享内存中,这样系统就不会被堵塞。然而,我并没有像我希望的那样顺利地工作,它现在引发了一个关于文件的奇怪错误,我不理解,因为我甚至不打开文件中的文件。

大致了解程序中正在发生的事情,用1000次迭代代表算法中发生的事情。

import multiprocessing
import ctypes
import numpy as np

shared_array_base = multiprocessing.Array(ctypes.c_double, 10*10)
shared_array = np.ctypeslib.as_array(shared_array_base.get_obj())
shared_array = shared_array.reshape(10, 10)

def my_func(i, shared_array):
    shared_array[i,:] = i

def pool_init(_shared_array, _constans):
    global shared_array, constans
    shared_array = _shared_array
    constans = _constans

def pool_my_func(i):
    my_func(i, shared_array)

if __name__ == '__main__':
    for i in np.arange(1000):
        pool = multiprocessing.Pool(8, pool_init, (shared_array, 4))
        pool.map(pool_my_func, range(10))
    print(shared_array)

这引发了这个错误(我在OSX上):

Traceback (most recent call last):
  File "weird.py", line 24, in <module>
    pool = multiprocessing.Pool(8, pool_init, (shared_array, 4))
  File "//anaconda/lib/python3.4/multiprocessing/context.py", line 118, in Pool
    context=self.get_context())
  File "//anaconda/lib/python3.4/multiprocessing/pool.py", line 168, in __init__
    self._repopulate_pool()
  File "//anaconda/lib/python3.4/multiprocessing/pool.py", line 233, in _repopulate_pool
    w.start()
  File "//anaconda/lib/python3.4/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "//anaconda/lib/python3.4/multiprocessing/context.py", line 267, in _Popen
    return Popen(process_obj)
  File "//anaconda/lib/python3.4/multiprocessing/popen_fork.py", line 21, in __init__
    self._launch(process_obj)
  File "//anaconda/lib/python3.4/multiprocessing/popen_fork.py", line 69, in _launch
    parent_r, child_w = os.pipe()
OSError: [Errno 24] Too many open files

我很难过。我甚至不打开这里的文件。我想做的就是以不会阻塞系统内存的方式将shared_array传递给各个进程,我甚至不需要在并行化过程中修改它,如果这对任何事都有帮助的话

此外,如果它很重要,正确的代码本身引发的确切错误有点不同:

Traceback (most recent call last):
  File "tcap.py", line 206, in <module>
  File "tcap.py", line 202, in main
  File "tcap.py", line 181, in tcap_cluster
  File "tcap.py", line 133, in ap_step
  File "//anaconda/lib/python3.4/multiprocessing/context.py", line 118, in Pool
  File "//anaconda/lib/python3.4/multiprocessing/pool.py", line 168, in __init__
  File "//anaconda/lib/python3.4/multiprocessing/pool.py", line 233, in _repopulate_pool
  File "//anaconda/lib/python3.4/multiprocessing/process.py", line 105, in start
  File "//anaconda/lib/python3.4/multiprocessing/context.py", line 267, in _Popen
  File "//anaconda/lib/python3.4/multiprocessing/popen_fork.py", line 21, in __init__
  File "//anaconda/lib/python3.4/multiprocessing/popen_fork.py", line 69, in _launch
OSError: [Errno 24] Too many open files

所以是的,我不知道该怎么办。任何帮助,将不胜感激。提前谢谢!

1 个答案:

答案 0 :(得分:6)

您正在尝试创建 1000 流程池,这些流程池不会被回收(出于某种原因);这些已经消耗了主进程中用于主进程及其子进程之间通信的管道的所有可用文件描述符

也许您想要使用:

pool = multiprocessing.Pool(8, pool_init, (shared_array, 4))
for _ in range(1000):
    pool.map(pool_my_func, range(10))