Question

我正在尝试利用concurrent.futures.ProcessPoolExecutor中的Python3来并行处理大型矩阵。代码的一般结构是：

class X(object):

self.matrix

def f(self, i, row_i):
    <cpu-bound process>

def fetch_multiple(self, ids):
    with ProcessPoolExecutor() as executor:
        futures = [executor.submit(self.f, i, self.matrix.getrow(i)) for i in ids]
        return [f.result() for f in as_completed(futures)]

self.matrix是一个很大的scipy csr_matrix。 f是我的同意函数，需要一行self.matrix并在其上应用 CPU绑定进程。最后，fetch_multiple是一个并行运行多个f实例并返回结果的函数。

问题是在运行脚本后，所有cpu核心的忙碌都不到50％（参见下面的屏幕截图）：

为什么所有内核都不忙？

我认为问题是self.matrix的大对象并在进程之间传递行向量。我该如何解决这个问题？

Answer 1

是。开销不应该那么大 - 但它可能是你的CPU出现问题的原因（尽管如此，他们应该忙着传递数据）。

但是在这里尝试使用共享内存将对象的“指针”传递给子进程。

http://briansimulator.org/sharing-numpy-arrays-between-processes/

从那里引用：

from multiprocessing import sharedctypes
size = S.size
shape = S.shape
S.shape = size
S_ctypes = sharedctypes.RawArray('d', S)
S = numpy.frombuffer(S_ctypes, dtype=numpy.float64, count=size)
S.shape = shape

现在我们可以将S_ctypes和shape发送到子进程中多处理，并将其转换回子节点中的numpy数组过程如下：

from numpy import ctypeslib
S = ctypeslib.as_array(S_ctypes)
S.shape = shape

处理引用计数应该很棘手，但我认为numpy.ctypeslib负责处理 - 所以，只需协调将实际行号传递给子进程，使它们不起作用在相同的数据

为什么concurrent.futures.ProcessPoolExecutor的性能非常低？

1 个答案: