Question

我有一个包含数百万行和数百列的大矩阵。第一个n行（大约100K）是参考行，而对于其他行，我想在参考向量中找到k（大约10个）最近邻居scipy cdist

我从矩阵中创建了multiprocessing.sharedctypes.Array，然后使用asarray并切片以分割矩阵并使用cdist计算距离。

我目前的代码如下：

import numpy
from multiprocessing import Pool, sharedctypes
from scipy.spatial.distance import cdist

shared_m = None

def generate_sample():
    m = numpy.random.random((20000, 640))
    shape = m.shape
    global shared_m
    shared_m = sharedctypes.Array('d', m.flatten(), lock=False)
    return shape

def dist(A, B, metric):
    return cdist(A, B, metric=metric)

def get_closest(args):
    shape, lenA, start_B, end_B, metric, numres = args
    m = numpy.ctypeslib.as_array(shared_m)
    m.shape = shape
    A = numpy.asarray(m[:lenA,:], dtype=numpy.double)
    B = numpy.asarray(m[start_B:end_B,:], dtype=numpy.double)
    distances = dist(B, A, metric)
    # rest of code to find closests

def p_get_closest(shape, lenA, lenB, metric="cosine", sample_size=1000, numres=10):
    p = Pool(4)
    args = ((shape, lenA, i, i + sample_size, metric, numres)
            for i in xrange(lenB / sample_size))
    res = p.map_async(get_closest, args)
    return res.get()

def main():
    shape = generate_sample()
    p_get_closest(shape, 5000, shape[0] - 5000, "cosine", 3000, 10)

if __name__ == "__main__":
    main()

我现在的问题是，cdist的并行调用会以某种方式相互阻塞。（也许我错误地使用了块表达式。问题是没有并行的cdist计算）

我尝试将打印输出的问题追溯到scipy/spatial/distance.py和scipy/spatial/src/distance.c，以了解运行阻止的位置。看起来没有数据复制，dtypes参数处理了这个问题。

将printf放入distance.c:cdist_cosine()时，会显示所有进程都到达实际计算开始的位置（for循环之前），但计算并不是并行运行。

我在创建multiprocessing.sharedctypes.RawArray时使用Array尝试了许多内容，例如使用lock=True代替Array。

我不知道自己做错了什么或如何调查问题。

scipy并行cdist与多处理

0 个答案: