我正在尝试将大矩阵存储到共享内存中,以便每个Pool工作人员都可以使用它,而不必复制它。我遵循了here中的一个简单示例,但是与直接馈入数组相比,我看不到任何加速。
index.js
现在,使用共享的RawArray:
import numpy as np
import time
from multiprocessing import Pool, RawArray, Value
# A global dictionary storing the variables passed from the initializer.
var_dict = {}
def init_worker(X, X_shape,a,b,c,d):
# Using a dictionary is not strictly necessary. You can also
# use global variables.
var_dict['X'] = X
var_dict['X_shape'] = X_shape
def worker_func(i):
X_np = np.frombuffer(var_dict['X']).reshape(var_dict['X_shape'])
#do stuff#
return i
X_shape =(1000, 100000)
# Randomly generate some data
data = np.random.randn(*X_shape)
X_shared = RawArray('d', X_shape[0] * X_shape[1])
# Wrap X as an numpy array so we can easily manipulates its data.
X_np = np.frombuffer(X_shared).reshape(X_shape)
# Copy data to our shared array.
np.copyto(X_np, data)
与直接馈送数据相比,没有提供加速:
# Start the process pool and do the computation.
# Here we pass X and X_shape to the initializer of each worker.
# (Because X_shape is not a shared variable, it will be copied to each
# child process.)
with Pool(processes=4, initializer=init_worker, initargs=(X_shared, X_shape)) as pool:
result = pool.map(worker_func, range(50))
print('Results (pool):\n', np.array(result))
# Should print the same results.
print('Results (numpy):\n', np.sum(X_np, 1))
为什么会这样?