如何在不使用双循环的情况下加快矩阵的构建

时间:2021-06-09 14:00:05

标签: python performance numpy for-loop matrix

我想加速以下双 for 循环:

n = 10000
A = np.zeros((n, n))
twopi = 2.0 * np.pi
for i in range(0, n):
    for j in range(0, n):
        A[i,j] = twopi * i*j/n

我该怎么做?我的想法:计算一个向量 v 包含从 0n-1 的所有元素:

v = np.arange(n) 

然后构造两个矩阵IJ,一个矩阵的所有行都等于v,另一个矩阵的所有列都等于v。如果我没记错的话,我可以用

代替双for循环
A = twopi * I*J/n

这是正确的吗?我如何构建 IJ?有没有更好的方法?

4 个答案:

答案 0 :(得分:3)

像这样:

from numpy import outer, pi, arange

n = 10
i = arange(n)
A = 2 * pi / n * outer(i, i)

outer 产品满足您的要求。

编辑: 测试性能的简单方法如下:

from time import time

n = 10000
t = time()
for _ in range(10):
    i = arange(n)
    A = 2 * pi / n * outer(i, i)

t = time() - t
print("Time used [s]", t)

对我来说,这 10 次重复使用了 5 秒,rangearange 之间没有显着差异。

答案 1 :(得分:1)

您可以使用列表推导式:

A = np.array([   
        [twopi*i*j/n for j in range(n)] 
    for i in range(n)])

默认情况下 range 从零开始,因此 range(0,n) 而不是 range(n) 是多余的。

答案 2 :(得分:1)

In [118]: matrix = np.array([np.arange(n) for _ in range(n)])

In [119]: matrix
Out[119]:
array([[   0,    1,    2, ..., 9997, 9998, 9999],
       [   0,    1,    2, ..., 9997, 9998, 9999],
       [   0,    1,    2, ..., 9997, 9998, 9999],
       ...,
       [   0,    1,    2, ..., 9997, 9998, 9999],
       [   0,    1,    2, ..., 9997, 9998, 9999],
       [   0,    1,    2, ..., 9997, 9998, 9999]])

In [120]: matrix [0,1]
Out[120]: 1

In [121]: for j in range(n): matrix[:,j] *= j

In [122]: matrix
Out[122]:
array([[       0,        1,        4, ..., 99940009, 99960004, 99980001],
       [       0,        1,        4, ..., 99940009, 99960004, 99980001],
       [       0,        1,        4, ..., 99940009, 99960004, 99980001],
       ...,
       [       0,        1,        4, ..., 99940009, 99960004, 99980001],
       [       0,        1,        4, ..., 99940009, 99960004, 99980001],
       [       0,        1,        4, ..., 99940009, 99960004, 99980001]])

In [123]: matrix = matrix/n

In [124]: matrix = matrix * np.pi*2

In [125]: matrix
Out[125]:
array([[0.00000000e+00, 6.28318531e-04, 2.51327412e-03, ...,
        6.27941596e+04, 6.28067228e+04, 6.28192873e+04],
       [0.00000000e+00, 6.28318531e-04, 2.51327412e-03, ...,
        6.27941596e+04, 6.28067228e+04, 6.28192873e+04],
       [0.00000000e+00, 6.28318531e-04, 2.51327412e-03, ...,
        6.27941596e+04, 6.28067228e+04, 6.28192873e+04],
       ...,
       [0.00000000e+00, 6.28318531e-04, 2.51327412e-03, ...,
        6.27941596e+04, 6.28067228e+04, 6.28192873e+04],
       [0.00000000e+00, 6.28318531e-04, 2.51327412e-03, ...,
        6.27941596e+04, 6.28067228e+04, 6.28192873e+04],
       [0.00000000e+00, 6.28318531e-04, 2.51327412e-03, ...,
        6.27941596e+04, 6.28067228e+04, 6.28192873e+04]])

与脚本相同:

matrix = np.array([np.arange(n) for _ in range(n)])
for j in range(n): matrix[:,j] *= j
matrix = matrix/n
matrix = matrix * np.pi*2

答案 3 :(得分:1)

答案

你确实可以使用numpy的广播能力。

import numpy as np
n = 10000
twopi = 2.0 * np.pi / n
A = np.arange(n) * np.arange(n).reshape(-1, 1) * twopi

第二个np.arange()的重塑推动了产品的传播。

即使从时间性能的角度来看,这与将 np.outer() 调用为 Dr.V did 非常相似。

基准

配置:

import numpy as np
n = 10000
twopi = 2 * np.pi / n

次数:

# List comprehension
A = np.array([   
    [twopi*i*j for j in range(n)] 
for i in range(n)])
# timeit > 10.2 s ± 129 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# Outer
i = np.arange(n)
A = twopi * np.outer(i, i)
# timeit > 234 ms ± 6.87 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# Product
A = np.arange(n) * np.arange(n).reshape(-1, 1) * twopi
# timeit > 208 ms ± 41.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

如前所述,outer 和product 方法非常相似。 但是列表推导式并不是因为它违背了 numpy 的目的,即使用分布式子任务(以及通常的低级优化)对数组计算进行向量化。