在进程池之间共享字典和数组

时间:2015-05-22 19:27:06

标签: python dictionary python-multiprocessing

我一直在尝试创建一个字典,它将设备mac id作为键,并在列表中与该mac对应的信息。这样的事情。

{00-00-0A-14-01-06:[['CMTS-51-55_10.20', '10.20.1.1', '342900', 'Cable6/0/0', '110', 'Cable6/0/0-upstream0', '129', 'Cable6/0/0-downstream', '00-00-0A-14-01-06', '10.20.1.6', '11', '1', '1424419744000', '692306', 'SignalingDown', '1', '118800000', '990000', '0', '0', '0', '342900'], 
['CMTS-51-55_10.20', '10.20.1.1', '343800', 'Cable6/0/0', '110', 'Cable6/0/0-upstream0', '129', 'Cable6/0/0-downstream', '00-00-0A-14-01-06', '10.20.1.6', '11', '1', '1424420644000', '692306', 'SignalingDown', '1', '118800000', '990000', '0', '0', '0', '343800'], 
['CMTS-51-55_10.20', '10.20.1.1', '342900', 'Cable6/0/0', '110', 'Cable6/0/0-upstream0', '129', 'Cable6/0/0-downstream', '00-00-0A-14-01-06', '10.20.1.6', '11', '1', '1424419744000', '377773', 'SignalingUp', '2', '118800000', '990000', '0', '0', '0', '342900']]} 

从保存在多个文件夹中的多个文件中检索这些数据值。一个文件夹可以有多个文件。

我将此文件夹列表提供给进程池。因此,在一个进程中,一个文件夹中的所有文件都会被执行。

我正在维护一个本地字典(collection.defaultdict),用完整的信息填充它,然后将该信息放在共享的dictionany(manager.dict)中,我将它作为池对象的参数。

我也给出了一个字符数组,用于在子进程和主进程之间共享一些模板信息。

我正在尝试检查多处理部分中的共享任务,但我似乎没有让它工作。

请有人帮助我。

#!/usr/local/bin/pypy

from multiprocessing import Process
from multiprocessing import Pool, Manager ,Value, Array
import collections
from collections import defaultdict
import itertools
import os

def info(title):
    print title
    print 'module name:', __name__
    if hasattr(os, 'getppid'):  # only available on Unix
        print 'parent process:', os.getppid()
    print 'process id:', os.getpid()

def f(template,mydict):
    name = 'bob'
    info('function f')
    resultDeltaArray = collections.defaultdict(list)
    resultDeltaArray['b'].append("hi")
    resultDeltaArray['b'].append("bye")
    resultDeltaArray['c'].append("bye")
    resultDeltaArray['c'].append("bye")
    template = "name"
    print resultDeltaArray
    #print "templaate1", template
    for k,v in resultDeltaArray.viewitems():
        mydict[k] = v
    print 'hello', name
    #mydict = resultDeltaArray
    for k,v in mydict.items():
        print mydict[k]
        #del mydict[k]

if __name__ == '__main__':
    info('main line')
    manager = Manager()
    mydict = manager.dict()
    template = Array('c',50)
    #mydict[''] = []
    #print mydict
    todopool = Pool(2)
    todopool.map_async(f, itertools.repeat(template),itertools.repeat(mydict))
    #print "hi"
    #p = Process(target=f, args=('bob',template,mydict))
    #p.start()
    #p.join()
    print mydict
    mydict.clear()
    print mydict

    print "template2", template

代码是检查多处理部分。这不是实际的实施。 在这种情况下,它只是挂起而在打印后没有做任何事情:

main line
module name: __main__
parent process: 27301
process id: 27852

当我尝试使用ctrl-C中断该过程时,它会在打印后再次卡住

Traceback (most recent call last):
  File "/home/pydev/checkouts/dev/trunk/thirdparty/pypy_2.1/lib-python/2.7/multiprocessing/process.py", line 258, in _bootstrap
  Process PoolWorker-2:
Traceback (most recent call last):
  File "/home/pydev/checkouts/dev/trunk/thirdparty/pypy_2.1/lib-python    /2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/home/pydev/checkouts/dev/trunk/thirdparty/pypy_2.1/lib-python /2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/pydev/checkouts/dev/trunk/thirdparty/pypy_2.1/lib-python/2.7/multiprocessing/pool.py", line 85, in worker
    self.run()
  File "/home/pydev/checkouts/dev/trunk/thirdparty/pypy_2.1/lib-python/2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/pydev/checkouts/dev/trunk/thirdparty/pypy_2.1/lib-python/2.7/multiprocessing/pool.py", line 85, in worker
    task = get()
  File "/home/pydev/checkouts/dev/trunk/thirdparty/pypy_2.1/lib-python/2.7/multiprocessing/queues.py", line 374, in get
    racquire()
KeyboardInterrupt
    task = get()
  File "/home/pydev/checkouts/dev/trunk/thirdparty/pypy_2.1/lib-python/2.7/multiprocessing/queues.py", line 376, in get
    return recv()

我是否以正确的方式使用东西? Pool对象不允许多处理数组或manager.dict作为参数吗?有没有其他方法做同样的事情?

1 个答案:

答案 0 :(得分:2)

Dicts(作为内存中哈希表实现)的设计方式不利于进程之间的共享(本质上它们不共享内存)。

考虑使用具有共享内存的线程,可能使用from multiprocessing.pool import ThreadPool as Pool。或者使用替代结构,例如shelve(持久的,可共享的数据存储)。或者使用sqlite3让多个进程访问同一个共享数据库。安装和使用memcached或其他一些旨在跨进程共享的共享数据存储。

文档还展示了如何使用队列和管道跨进程共享数据,但这可能不是您想要的(共享键/值存储):https://docs.python.org/2.7/library/multiprocessing.html#exchanging-objects-between-processes