高效的双重产品

时间:2016-05-24 14:48:52

标签: python numpy

考虑两个ndarraysnarr1arr2。我计算了以下产品总数,然后num_runs次进行基准测试:

import numpy as np
import time

num_runs = 1000
n = 100

arr1 = np.random.rand(n)
arr2 = np.random.rand(n)

start_comp = time.clock()
for r in xrange(num_runs):
    sum_prods = np.sum( [arr1[i]*arr2[j] for i in xrange(n) 
                         for j in xrange(i+1, n)] )

print "total time for comprehension = ", time.clock() - start_comp

start_loop = time.clock()
for r in xrange(num_runs):
    sum_prod = 0.0
    for i in xrange(n):
        for j in xrange(i+1, n):
            sum_prod += arr1[i]*arr2[j]

print "total time for loop = ", time.clock() - start_loop

输出

total time for comprehension = 3.23097066953
total time for comprehension = 3.9045544426

所以使用列表理解会显得更快。

是否有更高效的实现,使用Numpy例程来计算这样的产品总和?

3 个答案:

答案 0 :(得分:12)

将操作重新排列为O(n)运行时算法而不是O(n ^ 2),利用NumPy获取产品和总和:

# arr1_weights[i] is the sum of all terms arr1[i] gets multiplied by in the
# original version
arr1_weights = arr2[::-1].cumsum()[::-1] - arr2

sum_prods = arr1.dot(arr1_weights)

时间显示这比n == 100的列表理解快约200倍。

In [21]: %%timeit
   ....: np.sum([arr1[i] * arr2[j] for i in range(n) for j in range(i+1, n)])
   ....: 
100 loops, best of 3: 5.13 ms per loop

In [22]: %%timeit
   ....: arr1_weights = arr2[::-1].cumsum()[::-1] - arr2
   ....: sum_prods = arr1.dot(arr1_weights)
   ....: 
10000 loops, best of 3: 22.8 µs per loop

答案 1 :(得分:8)

矢量化方式:np.sum(np.triu(np.multiply.outer(arr1,arr2),1))

提高了30倍:

In [9]: %timeit np.sum(np.triu(np.multiply.outer(arr1,arr2),1))
1000 loops, best of 3: 272 µs per loop

In [10]: %timeit np.sum( [arr1[i]*arr2[j] for i in range(n) 
                         for j in range(i+1, n)]
100 loops, best of 3: 7.9 ms per loop

In [11]: allclose(np.sum(np.triu(np.multiply.outer(arr1,arr2),1)),
np.sum(np.triu(np.multiply.outer(arr1,arr2),1)))
Out[11]: True

另一个快速的方法是使用numba:

from numba import jit
@jit
def t(arr1,arr2):
    s=0
    for i in range(n):
        for j in range(i+1,n):
            s+= arr1[i]*arr2[j]
    return s

获得10倍的新因素:

In [12]: %timeit t(arr1,arr2)
10000 loops, best of 3: 21.1 µs per loop

使用@ user2357112最小答案,

@jit
def t2357112(arr1,arr2):
    s=0
    c=0
    for i in range(n-2,-1,-1):
        c += arr2[i+1]
        s += arr1[i]*c
    return s

In [13]: %timeit t2357112(arr1,arr2)
100000 loops, best of 3: 2.33 µs per loop

,只是做必要的操作。

答案 2 :(得分:3)

您可以使用以下广播技巧:

a = np.sum(np.triu(arr1[:,None]*arr2[None,:],1))
b = np.sum( [arr1[i]*arr2[j] for i in xrange(n) for j in xrange(i+1, n)] )
print a == b  # True

基本上,我付出了在arr1arr2中成对计算所有元素的乘积的价格,以便利用numpy广播/矢量化的速度在低速下更快地完成级别代码。

和时间:

%timeit np.sum(np.triu(arr1[:,None]*arr2[None,:],1))
10000 loops, best of 3: 55.9 µs per loop

%timeit np.sum( [arr1[i]*arr2[j] for i in xrange(n) for j in xrange(i+1, n)] )
1000 loops, best of 3: 1.45 ms per loop