矢量化欧几里德距离计算 - NumPy

时间:2016-12-18 18:50:06

标签: python numpy scipy vectorization euclidean-distance

我的问题是关于我的代码的矢量化。我有一个包含3D坐标的数组和一个包含连接坐标的边信息的数组:

In [8]:coords
Out[8]: 
array([[ 11.22727013,  24.72620964,   2.02986932],
       [ 11.23895836,  24.67577744,   2.04130101],
       [ 11.23624039,  24.63677788,   2.04096866],
       [ 11.22516632,  24.5986824 ,   2.04045677],
       [ 11.21166992,  24.56095695,   2.03898215],
       [ 11.20334721,  24.5227356 ,   2.03556442],
       [ 11.2064085 ,  24.48479462,   2.03098583],
       [ 11.22059727,  24.44837189,   2.02649784],
       [ 11.24213409,  24.41513252,   2.01979685]])

In [13]:edges
Out[13]: 
array([[0, 1],
       [1, 2],
       [2, 3],
       [3, 4],
       [4, 5],
       [5, 6],
       [6, 7],
       [7, 8],], dtype=int32)

现在,我想计算边数组中坐标之间欧几里德距离的总和。例如。从坐标[0]到坐标[1]的距离+从坐标[1]到坐标[2]的距离......

我有以下代码,它完成了这项工作:

def networkLength(coords, edges):

   from scipy.spatial import distance 
   distancesNetwork = np.array([])    

   for i in range(edges.shape[0]):
        distancesNetwork = np.append(distancesNetwork, distance.euclidean(coords[edges[i, 0]], coords[edges[i, 1]]))

   return sum(distancesNetwork)

我想知道是否可以对代码进行矢量化,而不是进行循环。什么是蟒蛇的方式呢?非常感谢!!

1 个答案:

答案 0 :(得分:2)

方法#1

我们可以完全切出第一列和第二列以索引到coords而不是沿着它们迭代每个元素并执行欧几里德距离计算,这些计算涉及沿着每行的元素方形平方和求和然后得到元素方根。最后,我们需要将一个标量的所有值相加,如原始代码所示。

因此,一个矢量化实现将是 -

np.sqrt(((coords[edges[:, 0]] - coords[edges[:, 1]])**2).sum(1)).sum()

NumPy内置了np.linalg.norm这样的距离计算操作。在性能方面,我认为它与我们刚才列出的内容相当。为了完整起见,实现将是 -

np.linalg.norm(coords[edges[:, 0]] - coords[edges[:, 1]],axis=1).sum()

方法#2

调整之前的方法,我们可以使用np.einsum,在一个步骤中同时执行squaringsumming along each row,因此会更有效率。

实现看起来像这样 -

s = coords[edges[:, 0]] - coords[edges[:, 1]]
out = np.sqrt(np.einsum('ij,ij->i',s,s)).sum()

运行时测试

功能定义 -

def networkLength(coords, edges): # Original code from question
   distancesNetwork = np.array([])    
   for i in range(edges.shape[0]):
        distancesNetwork = np.append(distancesNetwork, \
        distance.euclidean(coords[edges[i, 0]], coords[edges[i, 1]]))
   return sum(distancesNetwork)

def vectorized_app1(coords, edges):
    return np.sqrt(((coords[edges[:, 0]] - coords[edges[:, 1]])**2).sum(1)).sum()

def vectorized_app2(coords, edges):
    s = coords[edges[:, 0]] - coords[edges[:, 1]]
    return np.sqrt(np.einsum('ij,ij->i',s,s)).sum()

验证和计时 -

In [114]: # Setup bigger inputs
     ...: coords = np.random.rand(100,3)
     ...: edges = np.random.randint(0,100,(10000,2))

# Verify results across all approaches
In [115]: networkLength(coords, edges)
Out[115]: 6607.8829431403547

In [116]: vectorized_app1(coords, edges)
Out[116]: 6607.8829431403337

In [117]: vectorized_app2(coords, edges)
Out[117]: 6607.8829431403337

In [118]: %timeit networkLength(coords, edges)
     ...: %timeit vectorized_app1(coords, edges)
     ...: %timeit vectorized_app2(coords, edges)
     ...: 
1 loops, best of 3: 519 ms per loop
1000 loops, best of 3: 822 µs per loop
1000 loops, best of 3: 668 µs per loop