考虑两个ndarrays
长n
,arr1
和arr2
。我计算了以下产品总数,然后num_runs
次进行基准测试:
import numpy as np
import time
num_runs = 1000
n = 100
arr1 = np.random.rand(n)
arr2 = np.random.rand(n)
start_comp = time.clock()
for r in xrange(num_runs):
sum_prods = np.sum( [arr1[i]*arr2[j] for i in xrange(n)
for j in xrange(i+1, n)] )
print "total time for comprehension = ", time.clock() - start_comp
start_loop = time.clock()
for r in xrange(num_runs):
sum_prod = 0.0
for i in xrange(n):
for j in xrange(i+1, n):
sum_prod += arr1[i]*arr2[j]
print "total time for loop = ", time.clock() - start_loop
输出
total time for comprehension = 3.23097066953
total time for comprehension = 3.9045544426
所以使用列表理解会显得更快。
是否有更高效的实现,使用Numpy例程来计算这样的产品总和?
答案 0 :(得分:12)
将操作重新排列为O(n)运行时算法而不是O(n ^ 2),和利用NumPy获取产品和总和:
# arr1_weights[i] is the sum of all terms arr1[i] gets multiplied by in the
# original version
arr1_weights = arr2[::-1].cumsum()[::-1] - arr2
sum_prods = arr1.dot(arr1_weights)
时间显示这比n == 100
的列表理解快约200倍。
In [21]: %%timeit
....: np.sum([arr1[i] * arr2[j] for i in range(n) for j in range(i+1, n)])
....:
100 loops, best of 3: 5.13 ms per loop
In [22]: %%timeit
....: arr1_weights = arr2[::-1].cumsum()[::-1] - arr2
....: sum_prods = arr1.dot(arr1_weights)
....:
10000 loops, best of 3: 22.8 µs per loop
答案 1 :(得分:8)
矢量化方式:np.sum(np.triu(np.multiply.outer(arr1,arr2),1))
。
提高了30倍:
In [9]: %timeit np.sum(np.triu(np.multiply.outer(arr1,arr2),1))
1000 loops, best of 3: 272 µs per loop
In [10]: %timeit np.sum( [arr1[i]*arr2[j] for i in range(n)
for j in range(i+1, n)]
100 loops, best of 3: 7.9 ms per loop
In [11]: allclose(np.sum(np.triu(np.multiply.outer(arr1,arr2),1)),
np.sum(np.triu(np.multiply.outer(arr1,arr2),1)))
Out[11]: True
另一个快速的方法是使用numba:
from numba import jit
@jit
def t(arr1,arr2):
s=0
for i in range(n):
for j in range(i+1,n):
s+= arr1[i]*arr2[j]
return s
获得10倍的新因素:
In [12]: %timeit t(arr1,arr2)
10000 loops, best of 3: 21.1 µs per loop
使用@ user2357112最小答案,
@jit
def t2357112(arr1,arr2):
s=0
c=0
for i in range(n-2,-1,-1):
c += arr2[i+1]
s += arr1[i]*c
return s
的
In [13]: %timeit t2357112(arr1,arr2)
100000 loops, best of 3: 2.33 µs per loop
,只是做必要的操作。
答案 2 :(得分:3)
您可以使用以下广播技巧:
a = np.sum(np.triu(arr1[:,None]*arr2[None,:],1))
b = np.sum( [arr1[i]*arr2[j] for i in xrange(n) for j in xrange(i+1, n)] )
print a == b # True
基本上,我付出了在arr1
和arr2
中成对计算所有元素的乘积的价格,以便利用numpy广播/矢量化的速度在低速下更快地完成级别代码。
和时间:
%timeit np.sum(np.triu(arr1[:,None]*arr2[None,:],1))
10000 loops, best of 3: 55.9 µs per loop
%timeit np.sum( [arr1[i]*arr2[j] for i in xrange(n) for j in xrange(i+1, n)] )
1000 loops, best of 3: 1.45 ms per loop