我有以下代码,我计算数组元素的平均差异,每个元素都相互之间。有没有办法比嵌套循环更有效地做到这一点,比如numpy-function?
import numpy as np
a = np.array([0.02625, -0.04125, -0.00875, -0.05625, 0.04375, 0.03625])
delta = []
for i in range(len(a) - 1):
for j in range(i+1, len(a)):
delta.append(a[i] - a[j])
delta = np.array(delta)
avg_dist = np.sum(np.abs(delta)) / delta.size
答案 0 :(得分:2)
方法#1
使用np.triu_indices
/ np.tril_indices
获取成对索引,使用它们索引输入数组,从而计算差异 -
I,J = np.triu_indices(len(a),1)
delta = a[I] - a[J]
方法#2
我们也可以使用slicing
一个循环,这应该是内存有效的,因为它避免生成索引,如前面的方法所做的那样 -
def pairwise_diff(a):
n = len(a)
N = n*(n-1)//2
idx = np.concatenate(( [0], np.arange(n-1,0,-1).cumsum() ))
start, stop = idx[:-1], idx[1:]
out = np.empty(N,dtype=a.dtype)
for j,i in enumerate(range(n-1)):
out[start[j]:stop[j]] = a[i,None] - a[i+1:]
return out
使用10000
元素的大型数组上的计时 -
In [214]: a = np.random.rand(10000)
# Approach #1
In [215]: %%timeit
...: I,J = np.triu_indices(len(a),1)
...: delta = a[I] - a[J]
1 loop, best of 3: 627 ms per loop
# Approach #2
In [216]: %timeit pairwise_diff(a)
10 loops, best of 3: 69.1 ms per loop
# Original approach
In [217]: %%timeit
...: delta = []
...: for i in range(len(a) - 1):
...: for j in range(i+1, len(a)):
...: delta.append(a[i] - a[j])
1 loop, best of 3: 15.7 s per loop