用Scipy计算行方式点积两个矩阵的矢量化方式

时间:2013-03-25 13:55:24

标签: numpy scipy vectorization matrix-multiplication dot-product

我想尽可能快地计算相同维度的两个矩阵的行方点积。这就是我这样做的方式:

import numpy as np
a = np.array([[1,2,3], [3,4,5]])
b = np.array([[1,2,3], [1,2,3]])
result = np.array([])
for row1, row2 in a, b:
    result = np.append(result, np.dot(row1, row2))
print result

当然输出是:

[ 26.  14.]

5 个答案:

答案 0 :(得分:24)

查看numpy.einsum了解其他方法:

In [52]: a
Out[52]: 
array([[1, 2, 3],
       [3, 4, 5]])

In [53]: b
Out[53]: 
array([[1, 2, 3],
       [1, 2, 3]])

In [54]: einsum('ij,ij->i', a, b)
Out[54]: array([14, 26])

看起来einsuminner1d快一点:

In [94]: %timeit inner1d(a,b)
1000000 loops, best of 3: 1.8 us per loop

In [95]: %timeit einsum('ij,ij->i', a, b)
1000000 loops, best of 3: 1.6 us per loop

In [96]: a = random.randn(10, 100)

In [97]: b = random.randn(10, 100)

In [98]: %timeit inner1d(a,b)
100000 loops, best of 3: 2.89 us per loop

In [99]: %timeit einsum('ij,ij->i', a, b)
100000 loops, best of 3: 2.03 us per loop

答案 1 :(得分:22)

直截了当的方法是:

import numpy as np
a=np.array([[1,2,3],[3,4,5]])
b=np.array([[1,2,3],[1,2,3]])
np.sum(a*b, axis=1)

避免了python循环,在以下情况下更快:

def npsumdot(x, y):
    return np.sum(x*y, axis=1)

def loopdot(x, y):
    result = np.empty((x.shape[0]))
    for i in range(x.shape[0]):
        result[i] = np.dot(x[i], y[i])
    return result

timeit npsumdot(np.random.rand(500000,50),np.random.rand(500000,50))
# 1 loops, best of 3: 861 ms per loop
timeit loopdot(np.random.rand(500000,50),np.random.rand(500000,50))
# 1 loops, best of 3: 1.58 s per loop

答案 2 :(得分:20)

玩弄了这个,发现inner1d最快:

enter image description here

情节是用perfplot(我的一个小项目)创建的

import numpy
from numpy.core.umath_tests import inner1d
import perfplot

perfplot.show(
    setup=lambda n: (numpy.random.rand(n, 3), numpy.random.rand(n, 3)),
    n_range=[2**k for k in range(1, 18)],
    kernels=[
        lambda data: numpy.sum(data[0] * data[1], axis=1),
        lambda data: numpy.einsum('ij, ij->i', data[0], data[1]),
        lambda data: inner1d(data[0], data[1])
        ],
    labels=['np.sum(a*b, axis=1)', 'einsum', 'inner1d'],
    logx=True,
    logy=True,
    xlabel='len(a), len(b)'
    )

答案 3 :(得分:4)

你会更好地避免使用append,但我想不出一种避免python循环的方法。也许是自定义Ufunc?我不认为numpy.vectorize会帮助你。

import numpy as np
a=np.array([[1,2,3],[3,4,5]])
b=np.array([[1,2,3],[1,2,3]])
result=np.empty((2,))
for i in range(2):
    result[i] = np.dot(a[i],b[i]))
print result

修改

基于this answer,如果现实问题中的向量为1D,inner1d可能会有效。

from numpy.core.umath_tests import inner1d
inner1d(a,b)  # array([14, 26])

答案 4 :(得分:0)

我遇到了这个答案,并使用在Python 3.5中运行的Numpy 1.14.3重新验证了结果。在大多数情况下,以上答案在我的系统上都是正确的,尽管我发现对于非常大的矩阵(请参见下面的示例),除一种方法外,其他所有方法都非常接近,因此性能差异毫无意义。

对于较小的矩阵,我发现einsum是最快的,而且幅度相当大,在某些情况下可达两倍。

我的大型矩阵示例:

import numpy as np
from numpy.core.umath_tests import inner1d
a = np.random.randn(100, 1000000)  # 800 MB each
b = np.random.randn(100, 1000000)  # pretty big.

def loop_dot(a, b):
    result = np.empty((a.shape[1],))
    for i, (row1, row2) in enumerate(zip(a, b)):
        result[i] = np.dot(row1, row2)
%timeit inner1d(a, b)
# 128 ms ± 523 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit np.einsum('ij,ij->i', a, b)
# 121 ms ± 402 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit np.sum(a*b, axis=1)
# 411 ms ± 1.99 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit loop_dot(a, b)  # note the function call took negligible time
# 123 ms ± 342 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

因此,einsum在超大型矩阵上仍然是最快的,但数量很少。不过,这似乎是具有统计意义的(微小)数量!