我的代码

#!/usr/bin/env python

"""Calculate a score for a square matrix."""

import random
random.seed(0)


def calculate_score(cm):
    """
    Calculate a score how close big elements of cm are to the diagonal.

    Examples
    --------
    >>> cm = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
    >>> calculate_score(cm)
    32
    """
    score = 0
    for i, line in enumerate(cm):
        for j, el in enumerate(line):
            score += el * abs(i - j)
    return score


def main(n):
    import time
    import numpy as np
    score_calculations = 10**3

    t = 0
    for step in range(score_calculations):
        cm = np.random.randint(0, 150000, size=(n, n))
        t0 = time.time()
        calculate_score(cm)
        t1 = time.time()
        t += (t1 - t0)
    print("{:0.2f} scores / sec".format(score_calculations / t))

if __name__ == '__main__':
    main(369)

分析

目前的代码仅提供32.47分/秒。 kernprof -l -v main.py给出以下结果：

我试图循环遍历元素本身（循环中的range(n)），但是将速度降低到20.02分/秒。

Wrote profile results to main.py.lprof
Timer unit: 1e-06 s

Total time: 109.124 s
File: main.py
Function: calculate_score at line 9

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     9                                           @profile
    10                                           def calculate_score(cm):
    11                                               """
    12                                               Calculate a score how close big elements of cm are to the diagonal.
    13                                           
    14                                               Examples
    15                                               --------
    16                                               >>> cm = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
    17                                               >>> calculate_score(cm)
    18                                               32
    19                                               """
    20      1000          619      0.6      0.0      score = 0
    21    370000       180693      0.5      0.2      for i, line in enumerate(cm):
    22 136530000     43691655      0.3     40.0          for j, el in enumerate(line):
    23 136161000     65250190      0.5     59.8              score += el * abs(i - j)
    24      1000          386      0.4      0.0      return score

我不确定是否有什么可以让它更快，因为代码似乎很简单。

Answer 1

这是一种使用broadcasting计算weights然后matrix-multiplication与np.tensordot sum-reductions的{{1}}的矢量化方法 -

def calculate_score_vectorized(cm):    
    m,n = cm.shape 
    wghts = np.abs(np.arange(n) - np.arange(m)[:,None])
    return np.tensordot(cm,wghts, axes=((0,1),(0,1)))

sum-reduction的最后一步也可以使用np.einsum -

计算

np.einsum('ij,ij',cm,wghts)

还简单地使用逐元素乘法和求和 -

(cm*wghts).sum()

运行时测试 -

In [104]: n = 369

In [105]: cm = np.random.randint(0, 150000, size=(n, n))

In [106]: calculate_score(cm)
Out[106]: 1257948732168

In [107]: calculate_score_vectorized(cm)
Out[107]: array(1257948732168)

In [108]: %timeit calculate_score(cm)
10 loops, best of 3: 31.4 ms per loop

In [109]: %timeit calculate_score_vectorized(cm)
1000 loops, best of 3: 675 µs per loop

In [110]: 31400/675.0
Out[110]: 46.51851851851852

对于给定的数据集大小，

46x+ 加速。

正如评论中所提到的，如果输入数组的形状保持不变，我们可以保存权重wghts并使用之前讨论的sum-reduction方法重新使用它们以进一步提升。

完整代码

#!/usr/bin/env python

"""Calculate a score for a square matrix."""

import random
random.seed(0)
import numpy as np


def calculate_score(cm, weights):
    """
    Calculate a score how close big elements of cm are to the diagonal.

    Examples
    --------
    >>> cm = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
    >>> weights = calculate_weight_matrix(3)
    >>> calculate_score(cm, weights)
    32
    """
    return int(np.tensordot(cm, weights, axes=((0, 1), (0, 1))))


def calculate_weight_matrix(n):
    """
    Calculate the weights for each position.

    The weight is the distance to the diagonal.
    """
    weights = np.abs(np.arange(n) - np.arange(n)[:, None])
    return weights


def measure_time(n):
    """Measure the time of calculate_score for n x n matrices."""
    import time
    import numpy as np
    score_calculations = 10**3

    t = 0
    weights = calculate_weight_matrix(n)
    for step in range(score_calculations):
        cm = np.random.randint(0, 150000, size=(n, n))
        t0 = time.time()
        calculate_score(cm, weights)
        t1 = time.time()
        t += (t1 - t0)
    print("{:0.2f} scores / sec".format(score_calculations / t))

if __name__ == '__main__':
    import doctest
    doctest.testmod()
    measure_time(369)

这给出了10044.31 scores / sec - 10381.71 scores / sec（之前：32.47分/秒）。这是 309×加速！

计算矩阵元素加权和的最快方法是什么？

我的代码

分析

1 个答案:

完整代码