Question

我正在尝试将大矩阵存储到共享内存中，以便每个Pool工作人员都可以使用它，而不必复制它。我遵循了here中的一个简单示例，但是与直接馈入数组相比，我看不到任何加速。

index.js

现在，使用共享的RawArray：

import numpy as np
import time
from multiprocessing import Pool, RawArray, Value

# A global dictionary storing the variables passed from the initializer.
var_dict = {}

def init_worker(X, X_shape,a,b,c,d):
    # Using a dictionary is not strictly necessary. You can also
    # use global variables.
    var_dict['X'] = X
    var_dict['X_shape'] = X_shape

def worker_func(i):
    X_np = np.frombuffer(var_dict['X']).reshape(var_dict['X_shape'])
    #do stuff#
    return i


X_shape =(1000, 100000)

# Randomly generate some data
data = np.random.randn(*X_shape)
X_shared = RawArray('d', X_shape[0] * X_shape[1])
# Wrap X as an numpy array so we can easily manipulates its data.
X_np = np.frombuffer(X_shared).reshape(X_shape)
# Copy data to our shared array.
np.copyto(X_np, data)

与直接馈送数据相比，没有提供加速：

# Start the process pool and do the computation.
# Here we pass X and X_shape to the initializer of each worker.
# (Because X_shape is not a shared variable, it will be copied to each
# child process.)
with Pool(processes=4, initializer=init_worker, initargs=(X_shared, X_shape)) as pool:
    result = pool.map(worker_func, range(50))
    print('Results (pool):\n', np.array(result))
# Should print the same results.
    print('Results (numpy):\n', np.sum(X_np, 1))

为什么会这样？

将大型阵列放在共享内存中不会为Pool工作人员提供加速

0 个答案: