我的问题是关于我的代码的矢量化。我有一个包含3D坐标的数组和一个包含连接坐标的边信息的数组:
In [8]:coords
Out[8]:
array([[ 11.22727013, 24.72620964, 2.02986932],
[ 11.23895836, 24.67577744, 2.04130101],
[ 11.23624039, 24.63677788, 2.04096866],
[ 11.22516632, 24.5986824 , 2.04045677],
[ 11.21166992, 24.56095695, 2.03898215],
[ 11.20334721, 24.5227356 , 2.03556442],
[ 11.2064085 , 24.48479462, 2.03098583],
[ 11.22059727, 24.44837189, 2.02649784],
[ 11.24213409, 24.41513252, 2.01979685]])
In [13]:edges
Out[13]:
array([[0, 1],
[1, 2],
[2, 3],
[3, 4],
[4, 5],
[5, 6],
[6, 7],
[7, 8],], dtype=int32)
现在,我想计算边数组中坐标之间欧几里德距离的总和。例如。从坐标[0]到坐标[1]的距离+从坐标[1]到坐标[2]的距离......
我有以下代码,它完成了这项工作:
def networkLength(coords, edges):
from scipy.spatial import distance
distancesNetwork = np.array([])
for i in range(edges.shape[0]):
distancesNetwork = np.append(distancesNetwork, distance.euclidean(coords[edges[i, 0]], coords[edges[i, 1]]))
return sum(distancesNetwork)
我想知道是否可以对代码进行矢量化,而不是进行循环。什么是蟒蛇的方式呢?非常感谢!!
答案 0 :(得分:2)
方法#1
我们可以完全切出第一列和第二列以索引到coords
而不是沿着它们迭代每个元素并执行欧几里德距离计算,这些计算涉及沿着每行的元素方形平方和求和然后得到元素方根。最后,我们需要将一个标量的所有值相加,如原始代码所示。
因此,一个矢量化实现将是 -
np.sqrt(((coords[edges[:, 0]] - coords[edges[:, 1]])**2).sum(1)).sum()
NumPy内置了np.linalg.norm
这样的距离计算操作。在性能方面,我认为它与我们刚才列出的内容相当。为了完整起见,实现将是 -
np.linalg.norm(coords[edges[:, 0]] - coords[edges[:, 1]],axis=1).sum()
方法#2
调整之前的方法,我们可以使用np.einsum
,在一个步骤中同时执行squaring
和summing along each row
,因此会更有效率。
实现看起来像这样 -
s = coords[edges[:, 0]] - coords[edges[:, 1]]
out = np.sqrt(np.einsum('ij,ij->i',s,s)).sum()
运行时测试
功能定义 -
def networkLength(coords, edges): # Original code from question
distancesNetwork = np.array([])
for i in range(edges.shape[0]):
distancesNetwork = np.append(distancesNetwork, \
distance.euclidean(coords[edges[i, 0]], coords[edges[i, 1]]))
return sum(distancesNetwork)
def vectorized_app1(coords, edges):
return np.sqrt(((coords[edges[:, 0]] - coords[edges[:, 1]])**2).sum(1)).sum()
def vectorized_app2(coords, edges):
s = coords[edges[:, 0]] - coords[edges[:, 1]]
return np.sqrt(np.einsum('ij,ij->i',s,s)).sum()
验证和计时 -
In [114]: # Setup bigger inputs
...: coords = np.random.rand(100,3)
...: edges = np.random.randint(0,100,(10000,2))
# Verify results across all approaches
In [115]: networkLength(coords, edges)
Out[115]: 6607.8829431403547
In [116]: vectorized_app1(coords, edges)
Out[116]: 6607.8829431403337
In [117]: vectorized_app2(coords, edges)
Out[117]: 6607.8829431403337
In [118]: %timeit networkLength(coords, edges)
...: %timeit vectorized_app1(coords, edges)
...: %timeit vectorized_app2(coords, edges)
...:
1 loops, best of 3: 519 ms per loop
1000 loops, best of 3: 822 µs per loop
1000 loops, best of 3: 668 µs per loop