Question

我正在尝试使用python中的并行处理来加速我的代码的一部分，但是我无法正常工作，甚至找不到与我相关的示例。

代码使用Delaunay三角剖分产生图像的低多边形版本，而使我减速的部分是找到每个三角形的平均值。

通过矢量化我的代码，我已经能够提高速度，但希望通过并行化获得更多：

我遇到问题的代码是一个非常简单的for循环：

for tri in tris:
        lopo[tridex==tri,:] = np.mean(hipo[tridex==tri,:],axis=0)

引用的变量如下。

tris - 三角形所有索引的唯一python列表

lopo - 图像的最终低多边形版本的Numpy数组

hipo - 原始图像的Numpy数组

tridex - 与图像大小相同的Numpy数组。每个元素代表一个像素并存储像素所在的三角形

我似乎无法找到一个使用多个numpy数组作为输入的好例子，其中一个是共享的。

我尝试过多处理（上面的代码片段包含在一个名为colorImage的函数中）：

p = Process(target=colorImage, args=(hipo,lopo,tridex,ppTris))
p.start()
p.join()

但我立即收到管道错误。

Answer 1

因此，Python的多处理工作（大多数情况下）的方式是您必须指定要运行的各个线程。我在这里做了一个简短的介绍性教程：http://will-farmer.com/parallel-python.html

在你的情况下，我建议将tris拆分成一堆不同的部分，每个部分大小相同，每个部分代表一个“工人”。您可以使用numpy.split()（此处提供文档：http://docs.scipy.org/doc/numpy/reference/generated/numpy.split.html）拆分此列表。

然后对于tri中的每个列表，我们使用Threading和Queue模块来指定8个worker。

import numpy as np
# split into 8 different lists
tri_lists = np.split(tris, 8)
# Queues are threadsafe
return_values = queue.Queue()
threads = []
def color_image(q, tris, hipo, tridex):
    """ This is the function we're parallelizing """
    for tri in tris:
        return_values.put(np.mean(hipo[tridex==tri,:], axis=0))
# Now we run the jobs
for i in range(8):
    threads.append(threading.Thread(
        target=color_image,
        args=(return_values, tri_lists[i], hipo, tridex)))
# Now we have to cleanup our results
# First get items from queue
results = [item for item in return_values.queue]
# Now set values in lopo
for i in range(len(results)):
    for t in tri_lists[i]:
        lopo[tridex==t, :] = results[i]

这不是最干净的方法，而且我不确定它是否有效，因为我无法测试它，但这是一个很好的方法。并行化部分现在是np.mean()，而设置值不是并行化的。

如果你想并行设置值，你必须有一个共享变量，使用Queue或全局变量。

有关共享的全局变量，请参阅此帖子：Python Global Variable with thread

使用Numpy并行化图像处理

1 个答案: