此代码显示了我想要做的结构。
import multiprocessing
from foo import really_expensive_to_compute_object
## Create a really complicated object that is *hard* to initialise.
T = really_expensive_to_compute_object(10)
def f(x):
return T.cheap_calculation(x)
P = multiprocessing.Pool(processes=64)
results = P.map(f, range(1000000))
print results
问题是每个过程都要花费大量时间重新计算T而不是使用计算过一次的原始T.有办法防止这种情况吗? T有一个快速(深层)复制方法,所以我可以让Python使用它而不是重新计算吗?
答案 0 :(得分:2)
明确地将资源传递给子进程
所以你的代码可以重写为:
import multiprocessing
import time
import functools
class really_expensive_to_compute_object(object):
def __init__(self, arg):
print 'expensive creation'
time.sleep(3)
def cheap_calculation(self, x):
return x * 2
def f(T, x):
return T.cheap_calculation(x)
if __name__ == '__main__':
## Create a really complicated object that is *hard* to initialise.
T = really_expensive_to_compute_object(10)
## helper, to pass expensive object to function
f_helper = functools.partial(f, T)
# i've reduced count for tests
P = multiprocessing.Pool(processes=4)
results = P.map(f_helper, range(100))
print results
答案 1 :(得分:1)
为什么不让f
使用T
参数而不是引用全局参数,并自行复制?
import multiprocessing, copy
from foo import really_expensive_to_compute_object
## Create a really complicated object that is *hard* to initialise.
T = really_expensive_to_compute_object(10)
def f(t, x):
return t.cheap_calculation(x)
P = multiprocessing.Pool(processes=64)
results = P.map(f, (copy.deepcopy(T) for _ in range(1000000)), range(1000000))
print results