使用Python Numpy,我如何最有效地从Nx1矩阵逐元素减去NxM矩阵?

时间:2018-08-29 07:18:06

标签: arrays python-2.7 performance numpy transpose

x为3x4的Numpy矩阵,定义如下:

x = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])
In: x
Out:
array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

y为3x1矩阵,定义如下:

y = np.array([3, 6 ,9])
In: y
Out: array([3, 6, 9])

如何最有效地将y - x逐个元素相减,结果将是:

array([[ 2,  1,  0, -1],
       [ 1,  0, -1, -2],
       [ 0, -1, -2, -3]])

我发现做到这一点的唯一方法是:

-1.0*(x.T + (-1.0*y)).T

但是,在进行概要分析时,我发现由于我多次进行上述计算并且使用大矩阵,因此最后一行证明是我的应用程序的瓶颈。因此,我问:有没有更好,更有效的方法呢?

2 个答案:

答案 0 :(得分:1)

  

让y为3x1矩阵,定义如下:

y = np.array([3, 6 ,9])

那不是3x1矩阵(more info here):

>>> y.shape
(3,)

3x1矩阵的生成是

>>> y_new = np.array([[3], [6], [9]])
>>> y_new.shape
(3, 1)

或者从您现有的y中使用:

>>> y_new = y[:, np.newaxis]

一旦您实际拥有3x1和3x4矩阵,就可以将它们相减

>>> x - y_new

答案 1 :(得分:0)

正如其他人已经指出的那样,NumPy的broadcasting是您的朋友。 请注意,由于这种广播规则,与其他面向矩阵的技术堆栈(请阅读:MATLAB / Octave)相比,在NumPy中使用转置操作的频率实际上要低得多。

已编辑(已重组)

关键是要获得正确形状的数组。 最好的方法是使用具有额外的np.newaxis / None值的切片。但是您也可以使用ndarray.reshape()

import numpy as np
x = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])
y = np.array([3, 6 ,9]).reshape(-1, 1)  # same as: y = np.array([3, 6 ,9])[:, None]
y - x

最重要的是,形状正确的数组将允许使用numexpr,对于大型数组,此数组可能比NumPy效率更高(如果瓶颈是该操作,那么它可能非常适合您的算法):

import numpy as np
import numexpr as ne

x = np.random.randint(1, 100, (3, 4))
y = np.random.randint(1, 100, (3, 1))

%timeit y - x
# The slowest run took 43.14 times longer than the fastest. This could mean that an intermediate result is being cached.
# 1000000 loops, best of 3: 879 ns per loop

%timeit ne.evaluate('y - x')
# The slowest run took 20.86 times longer than the fastest. This could mean that an intermediate result is being cached.
# 100000 loops, best of 3: 10.8 µs per loop

# not so exciting for small arrays, but for somewhat larger numbers...
x = np.random.randint(1, 100, (3000, 4000))
y = np.random.randint(1, 100, (3000, 1))

%timeit y - x
# 10 loops, best of 3: 33.1 ms per loop
%timeit ne.evaluate('y - x')
# 100 loops, best of 3: 10.7 ms per loop

# which is roughly a factor 3 faster on my machine

在这种情况下,获得正确形状的帐户的方式没有太大区别-切片 reshape -但切片似乎快了两倍。 要添加一些数字(根据评论进行编辑):

import numpy as np

# creating the array does not depend too much as long as its size is the same

%timeit y = np.zeros((3000000))
# 838 µs ± 10.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit y = np.zeros((3000000, 1))
# 825 µs ± 12.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit y = np.zeros((3000, 1000))
# 827 µs ± 14.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


# ...and reshaping / slicing is independent of the array size

x = np.zeros(3000000)
%timeit x[:, None]
# 147 ns ± 4.02 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
%timeit x.reshape(-1, 1)
# 214 ns ± 9.55 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

x = np.zeros(300)
%timeit x[:, None]
# 146 ns ± 0.659 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
%timeit x.reshape(-1, 1)
# 212 ns ± 1.56 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

毋庸置疑,%timeit基准应该与一粒盐一起使用。