Question

按照我之前的问题[1]，我想将多处理应用于matplotlib的griddata函数。是否可以将网格数据分成4个部分，每4个核心分成一个？我需要这个以提高性能。

例如，请尝试以下代码，尝试size的不同值：

import numpy as np
import matplotlib.mlab as mlab
import time

size = 500

Y = np.arange(size)
X = np.arange(size)
x, y = np.meshgrid(X, Y)
u = x * np.sin(5) + y * np.cos(5)
v = x * np.cos(5) + y * np.sin(5)
test = x + y

tic = time.clock()

test_d = mlab.griddata(
    x.flatten(), y.flatten(), test.flatten(), x+u, y+v, interp='linear')

toc = time.clock()

print 'Time=', toc-tic

Answer 1

我在Python 3.4.2中使用numpy版本1.9.1和matplotlib版本1.4.2在具有4个物理CPU的Macbook Pro上运行下面的示例代码（即，与Mac中的“虚拟”CPU相反）硬件架构也可用于某些用例）：

import numpy as np
import matplotlib.mlab as mlab
import time
import multiprocessing

# This value should be set much larger than nprocs, defined later below
size = 500

Y = np.arange(size)
X = np.arange(size)
x, y = np.meshgrid(X, Y)
u = x * np.sin(5) + y * np.cos(5)
v = x * np.cos(5) + y * np.sin(5)
test = x + y

tic = time.clock()

test_d = mlab.griddata(
    x.flatten(), y.flatten(), test.flatten(), x+u, y+v, interp='linear')

toc = time.clock()

print('Single Processor Time={0}'.format(toc-tic))

# Put interpolation points into a single array so that we can slice it easily
xi = x + u
yi = y + v
# My example test machine has 4 physical CPUs
nprocs = 4
jump = int(size/nprocs)

# Enclose the griddata function in a wrapper which will communicate its
# output result back to the calling process via a Queue
def wrapper(x, y, z, xi, yi, q):
    test_w = mlab.griddata(x, y, z, xi, yi, interp='linear')
    q.put(test_w)

# Measure the elapsed time for multiprocessing separately
ticm = time.clock()

queue, process = [], []
for n in range(nprocs):
    queue.append(multiprocessing.Queue())
    # Handle the possibility that size is not evenly divisible by nprocs
    if n == (nprocs-1):
        finalidx = size
    else:
        finalidx = (n + 1) * jump
    # Define the arguments, dividing the interpolation variables into
    # nprocs roughly evenly sized slices
    argtuple = (x.flatten(), y.flatten(), test.flatten(),
                xi[:,(n*jump):finalidx], yi[:,(n*jump):finalidx], queue[-1])
    # Create the processes, and launch them
    process.append(multiprocessing.Process(target=wrapper, args=argtuple))
    process[-1].start()

# Initialize an array to hold the return value, and make sure that it is
# null-valued but of the appropriate size
test_m = np.asarray([[] for s in range(size)])
# Read the individual results back from the queues and concatenate them
# into the return array
for q, p in zip(queue, process):
    test_m = np.concatenate((test_m, q.get()), axis=1)
    p.join()

tocm = time.clock()

print('Multiprocessing Time={0}'.format(tocm-ticm))

# Check that the result of both methods is actually the same; should raise
# an AssertionError exception if assertion is not True
assert np.all(test_d == test_m)

我得到了以下结果：

/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/matplotlib/tri/triangulation.py:110: FutureWarning: comparison to `None` will result in an elementwise object comparison in the future.self._neighbors)
Single Processor Time=8.495998
Multiprocessing Time=2.249938

我不确定是什么导致triangulation.py的“未来警告”（显然我的matplotlib版本不喜欢最初为问题提供的输入值），但无论如何，多处理< s>确实达到了所需的加速速度8.50 / 2.25 = 3.8 ，（编辑：见评论），这大约相当于我们对机器的预期约为4倍4个CPU。并且最后的断言语句也成功执行，证明这两种方法得到了相同的答案，所以尽管有一些奇怪的警告信息，我相信上面的代码是一个有效的解决方案。

编辑：一位评论者指出，我的解决方案以及原作者发布的代码片段都可能使用错误的方法time.clock()来衡量执行时间;他建议改用time.time()。我想我也是他的观点。（进一步深入Python文档，我仍然不相信即使这个解决方案100％正确，因为较新版本的Python似乎已弃用time.clock()而转而支持time.perf_counter()和{{ 3}}。但无论如何，我确实同意time.time()是否绝对是采用这种测量的最正确方法，它仍然可能比我之前使用的更正确time.clock()。）

假设评论者的观点是正确的，那么这意味着我认为我测量的大约4倍的加速实际上是错误的。

但是，这并不意味着底层代码本身没有正确并行化;相反，它只是意味着并行化在这种情况下实际上没有帮助;拆分数据并在多个处理器上运行并没有改善任何事情。为什么会这样？其他用户time.process_time()，至少在numpy / scipy中，某些功能在多个核心上运行，而有些则没有，并且对于最终用户来说，它可能是一个极具挑战性的研究项目，试图找出哪些是哪个。

根据这个实验的结果，如果我的解决方案在Python中正确地实现了并行化，但没有观察到进一步的加速，那么我建议最简单的可能解释是matplotlib可能也在“引擎盖下”并行化它的一些功能“可以这么说，在已编译的C ++库中，就像numpy / scipy已经做的那样。假设情况就是这样，那么这个问题的正确答案就是不能做任何进一步的事情：如果底层的C ++库已经在多个核心上静默运行，那么在Python中进一步并行化就不会有好处。

Python - matplotlib griddata的多处理

1 个答案: