应用于数组列表时,Numpy max会变慢

时间:2013-04-10 17:59:12

标签: python numpy

我执行一些计算以获得numpy数组的列表。随后,我想找到沿第一轴的最大值。我目前的实施(见下文)非常缓慢,我想找到替代方案。

原始

pending = [<list of items>]
matrix = [compute(item) for item in pending if <some condition on item>]
dominant = np.max(matrix, axis = 0)

修订版1:此实现速度更快(~10x;可能是因为numpy不需要弄清楚数组的形状)

pending = [<list of items>]
matrix = [compute(item) for item in pending if <some condition on item>]
matrix = np.vstack(matrix)
dominant = np.max(matrix, axis = 0)

我运行了几个测试,减速似乎是由于数组列表内部转换为numpy数组

 Timer unit: 1e-06 s
 Total time: 1.21389 s
 Line # Hits         Time  Per Hit   % Time  Line Contents
 ==============================================================
 4                                           def direct_max(list_of_arrays):
 5      1000      1213886   1213.9    100.0      np.max(list_of_arrays, axis = 0)

 Total time: 1.20766 s
 Line # Hits         Time  Per Hit   % Time  Line Contents
 ==============================================================
 8                                           def numpy_max(list_of_arrays):
 9      1000      1151281   1151.3     95.3      list_of_arrays = np.array(list_of_arrays)
10      1000        56384     56.4      4.7      np.max(list_of_arrays, axis = 0)

Total time: 0.15437 s
Line # Hits         Time  Per Hit   % Time  Line Contents
==============================================================
12                                           @profile
13                                           def stack_max(list_of_arrays):
14      1000       102205    102.2     66.2      list_of_arrays = np.vstack(list_of_arrays)
15      1000        52165     52.2     33.8      np.max(list_of_arrays, axis = 0)

有没有办法加快max函数的速度,还是可以用我的计算结果有效地填充numpy数组,以便max最快?

1 个答案:

答案 0 :(得分:3)

您可以使用reduce(np.maximum, matrix),这是一个测试:

import numpy as np
np.random.seed(0)

N, M = 1000, 1000
matrix = [np.random.rand(N) for _ in xrange(M)]

%timeit np.max(matrix, axis = 0)
%timeit np.max(np.vstack(matrix), axis = 0)
%timeit reduce(np.maximum, matrix)

结果是:

10 loops, best of 3: 116 ms per loop
10 loops, best of 3: 10.6 ms per loop
100 loops, best of 3: 3.66 ms per loop

修改

`argmax()'更难,但你可以使用for循环:

def argmax_list(matrix):
    m = matrix[0].copy()
    idx = np.zeros(len(m), dtype=np.int)
    for i, a in enumerate(matrix[1:], 1):
        mask = m < a
        m[mask] = a[mask]
        idx[mask] = i
    return idx

它仍然比argmax()

%timeit np.argmax(matrix, axis=0)
%timeit np.argmax(np.vstack(matrix), axis=0)
%timeit argmax_list(matrix)

结果:

10 loops, best of 3: 131 ms per loop
10 loops, best of 3: 21 ms per loop
100 loops, best of 3: 13.1 ms per loop