有效地计算数组元素的所有成对组合的度量

时间:2018-04-17 16:39:15

标签: python numpy combinatorics

我有以下代码,我计算数组元素的平均差异,每个元素都相互之间。有没有办法比嵌套循环更有效地做到这一点,比如numpy-function?

import numpy as np

a = np.array([0.02625, -0.04125, -0.00875, -0.05625, 0.04375, 0.03625])

delta = []
for i in range(len(a) - 1):
  for j in range(i+1, len(a)):
    delta.append(a[i] - a[j])    
delta = np.array(delta)

avg_dist = np.sum(np.abs(delta)) / delta.size

1 个答案:

答案 0 :(得分:2)

方法#1

使用np.triu_indices / np.tril_indices获取成对索引,使用它们索引输入数组,从而计算差异 -

I,J = np.triu_indices(len(a),1)
delta = a[I] - a[J]

方法#2

我们也可以使用slicing一个循环,这应该是内存有效的,因为它避免生成索引,如前面的方法所做的那样 -

def pairwise_diff(a):
    n = len(a)
    N = n*(n-1)//2
    idx = np.concatenate(( [0], np.arange(n-1,0,-1).cumsum() ))
    start, stop = idx[:-1], idx[1:]
    out = np.empty(N,dtype=a.dtype)
    for j,i in enumerate(range(n-1)):
        out[start[j]:stop[j]] = a[i,None] - a[i+1:]
    return out

使用10000元素的大型数组上的计时 -

In [214]: a = np.random.rand(10000)

# Approach #1
In [215]: %%timeit
     ...: I,J = np.triu_indices(len(a),1)
     ...: delta = a[I] - a[J]
1 loop, best of 3: 627 ms per loop

# Approach #2
In [216]: %timeit pairwise_diff(a)
10 loops, best of 3: 69.1 ms per loop

# Original approach
In [217]: %%timeit
     ...: delta = []
     ...: for i in range(len(a) - 1):
     ...:   for j in range(i+1, len(a)):
     ...:     delta.append(a[i] - a[j])
1 loop, best of 3: 15.7 s per loop