Question

在Python和Matlab中，我编写了生成矩阵的代码，并使用索引函数填充它。 Python代码的执行时间大约是Matlab代码执行时间的20倍。具有相同结果的两个函数用python编写，bWay()基于this answer

以下是完整的Python代码：

import numpy as np
from timeit import timeit

height = 1080
width = 1920
heightCm = 30
distanceCm = 70

centerY = height / 2 - 0.5;
centerX = width / 2 - 0.5;

constPart = height * heightCm / distanceCm

def aWay():
    M = np.empty([height, width], dtype=np.float64);
    for y in xrange(height):
        for x in xrange(width):
            M[y, x] = np.arctan(pow((pow((centerX - x), 2) + pow((centerY - y), 2)), 0.5) / constPart)

def bWay():
    M = np.frompyfunc(
        lambda y, x: np.arctan(pow((pow((centerX - x), 2) + pow((centerY - y), 2)), 0.5) / constPart), 2, 1## Heading ##
    ).outer(
        np.arange(height),
        np.arange(width),
    ).astype(np.float64)

这是完整的Matlab代码：

height = 1080;
width = 1920;
heightCm = 30;
distanceCm = 70;

centerY = height / 2 + 0.5;
centerX = width / 2 + 0.5;

constPart = height * heightCm / distanceCm;
M = zeros(height, width);
for y = 1 : height
    for x = 1 : width
        M(y, x) = atan(((centerX - x)^2 + (centerY - y)^2)^0.5 / constPart);
    end
end

使用timeit.timeit测量的Python执行时间：

aWay() - 6.34s
bWay() - 6.68s

使用tic toc测量的Matlab执行时间：

0.373s

为了缩小范围，我测量了arctan，平方和循环次数

的Python：

>>> timeit('arctan(3)','from numpy import arctan', number = 1000000)
1.3365135641797679
>>> timeit('pow(3, 2)', number = 1000000)
0.11460829719908361
>>> timeit('power(3, 2)','from numpy import power', number = 1000000)
1.5427879383046275
>>> timeit('for x in xrange(10000000): pass', number = 1)
0.18364813832704385

Matlab的：

tic
for i = 1 : 1000000
    atan(3);
end
toc
Elapsed time is 0.179802 seconds.
tic
for i = 1 : 1000000
    3^2;
end
toc
Elapsed time is 0.044160 seconds.
tic
for x = 1:10000000
end
toc
Elapsed time is 0.034853 seconds.

在所有3个案例中，Python代码执行时间要长很多倍。

我有什么办法可以改善这个python代码的性能吗？

Answer 1

我只关注Python部分以及如何优化它（从未使用MATLAB，抱歉）。

如果我正确理解您的代码，您可以使用：

def fastway():
    x, y = np.ogrid[:width, :height]  # you may need to swap "x" and "y" here.
    return np.arctan(np.hypot(centerX-x, centerY-y) / constPart)

这是矢量化的，应该非常快。

%timeit fastway()
# 289 ms ± 9.62 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit aWay()
# 28.2 s ± 243 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit bWay()
# 29.3 s ± 790 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

如果您想知道：np.hypot(x, y)与(x**2 + y**2)**0.5相同。它不一定更快但更短，在某些边缘情况下可以得到更精确的结果。

此外，如果您需要操作标量，则不应使用NumPy函数。 NumPy函数的开销很大，处理一个元素所需的时间与处理一千个元素所需的时间相同，例如参见my answer on the question "Performance in different vectorization method in numpy"。

Answer 2

为了完成MSeifert的答案，这里是矢量化的Matlab代码：

height = 1080;
width = 1920;
heightCm = 30;
distanceCm = 70;

centerY = height / 2 + 0.5;
centerX = width / 2 + 0.5;

constPart = height * heightCm / distanceCm;
[x, y] = meshgrid(1:width, 1:height);
M = atan(hypot(centerX-x, centerY-y) / constPart);

在我的机器上，这需要0.057秒，而双循环需要0.20秒。

在同一台机器上，MSeifert的python解决方案需要0.082秒。

numpy vs Matlab速度 - arctan和power

2 个答案: