Python3同时填充字典

时间:2016-02-22 20:17:11

标签: python python-3.x python-multiprocessing

我想在循环中填写字典。循环中的迭代彼此独立。我想在具有数千个处理器的集群上执行此操作。这是我尝试和需要做的简化版本。

import multiprocessing

class Worker(multiprocessing.Process):
   def setName(self,name):
       self.name=name
   def run(self):
       print ('In %s' % self.name)
       return

if __name__ == '__main__':
   jobs = []
   names=dict()
   for i in range(10000):
       p = Worker()
       p.setName(str(i))
       names[str(i)]=i
       jobs.append(p)
       p.start()
   for j in jobs:
       j.join()

我在自己的计算机上在python3中尝试过这个,并收到以下错误:

    ..
    In 249
    Traceback (most recent call last):
      File "test.py", line 16, in <module>
        p.start()
      File         "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/multiprocessing/process.py", line 105, in start
    In 250
        self._popen = self._Popen(self)
      File         "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/multiprocessing/context.py", line 212, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
      File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/multiprocessing/context.py", line 267, in _Popen
return Popen(process_obj)
      File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/multiprocessing/popen_fork.py", line 20, in __init__
self._launch(process_obj)
      File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/multiprocessing/popen_fork.py", line 66, in _launch
parent_r, child_w = os.pipe()
    OSError: [Errno 24] Too many open files

有没有更好的方法呢?

2 个答案:

答案 0 :(得分:3)

multiprocessing通过管道与其子流程进行通信。每个子进程都需要两个打开的文件描述符,一个用于读取,另一个用于写入。如果启动10000个工作程序,则将结束打开20000个文件描述符,这些描述符超出OS X的默认限制(您的路径指示您正在使用它)。

您可以通过提高限额来解决问题。有关详细信息,请参阅https://superuser.com/questions/433746/is-there-a-fix-for-the-too-many-open-files-in-system-error-on-os-x-10-7-1 - 基本上,它相当于设置两个sysctl旋钮并增加shell的ulimit设置。

答案 1 :(得分:1)

您目前正在同时产生10000个进程。 真的并不是一个好主意 您看到的错误肯定是因为multiprocessing模块(似乎)使用 I nter P roccess C 的管道通信和开放管道/ FD的限制。

我建议使用不带Global interpreter lock JythonIronPython的{​​{1}}的python解释器,只需将multiprocessing模块替换为threading模块即可。

<小时/> 如果您仍想使用multiprocessing模块,可以使用这样的Proccess Pool来收集返回值:

from multiprocessing import Pool

def worker(params):
    name, someArg = params
    print ('In %s' % name)
    # do something with someArg here
    return (name, someArg)

if __name__ == '__main__':
    jobs = []
    names=dict()
    # Spawn 100 worker processes 
    pool = Pool(processes=100)
    # Fill with real data
    task_dict = dict(('name_{}'.format(i), i) for i in range(1000))
    # Process every task via our pool
    results = pool.map(worker, task_dict.items())
    # And convert the rsult to a dict
    results = dict(results)
    print (results)

这也适用于threading模块的最小更改。