我试图对本地定义的函数运行并行处理,如下所示:
import multiprocessing as mp
import numpy as np
import pdb
def testFunction():
x = np.asarray( range(1,10) )
y = np.asarray( range(1,10) )
def myFunc( i ):
return np.sum(x[0:i]) * y[i]
p = mp.Pool( mp.cpu_count() )
out = p.map( myFunc, range(0,x.size) )
print( out )
if __name__ == '__main__':
print( 'I got here' )
testFunction()
这样做时,出现以下错误:
cPickle.PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
如何像我在这里尝试的那样,使用多重处理来并行处理多个数组? x和y必须在函数内部定义;我不想让它们成为全局变量。
感谢所有帮助。
答案 0 :(得分:2)
只需使处理函数成为全局函数并传递成对的数组值,而不是通过函数中的索引来引用它们:
import multiprocessing as mp
import numpy as np
def process(inputs):
x, y = inputs
return x * y
def main():
x = np.asarray(range(10))
y = np.asarray(range(10))
with mp.Pool(mp.cpu_count()) as pool:
out = pool.map(process, zip(x, y))
print(out)
if __name__ == '__main__':
main()
输出:
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
更新:根据提供的新详细信息,您必须在不同进程之间共享阵列。这正是multiprocessing.Manager
的用途。
Manager()返回的管理器对象控制服务器进程, 持有Python对象并允许其他进程操纵它们 使用代理。
因此,生成的代码将如下所示:
from functools import partial
import multiprocessing as mp
import numpy as np
def process(i, x, y):
return np.sum(x[:i]) * y[i]
def main():
manager = mp.Manager()
x = manager.Array('i', range(10))
y = manager.Array('i', range(10))
func = partial(process, x=x, y=y)
with mp.Pool(mp.cpu_count()) as pool:
out = pool.map(func, range(len(x)))
print(out)
if __name__ == '__main__':
main()
输出:
[0, 0, 2, 9, 24, 50, 90, 147, 224, 324]