如何计算2个numpy数组之间的按元素的欧式距离?例如;我有两个尺寸均为3x3的数组(称为数组A和数组B),我想计算值A [0,0]和B [0,0]之间的欧式距离。然后,我想计算值A [0,1]和B [0,1]之间的欧式距离。等等。因此输出数组也将是3x3。
如果我尝试使用scipy.spatial.distancecdist
,则会收到错误消息ValueError: XA must be a 2-dimensional array.
import numpy as np
from scipy.spatial.distance import cdist
a = np.array([
[(0,255,0),(255,255,0),(0,255,0)],
[(0,255,0),(255,255,0),(0,255,0)],
[(0,255,0),(255,255,0),(0,255,0)],
])
b = np.array([
[(255,255,0),(255,255,0),(0,255,0)],
[(255,255,0),(255,255,0),(0,255,0)],
[(255,255,0),(255,255,0),(0,255,0)],
])
dists = cdist(a, b, 'euclidean')
print(dists)
cdist(a,b,'cityblock')
,cdist(a,b,'sqeuclidean')
等编辑,我想要的输出是这样的(数学已经组成,但是数组尺寸是正确的3x3):
[[100, 0, 100]
[100, 0, 100]
[100, 0, 100]]
也就是说,我期望:
[[cdist((0,255,0), (255,255,0)), cdist((0,255,0), (255,255,0)), cdist((0,255,0), (255,255,0)),
[...]
[...]]
答案 0 :(得分:1)
下面列出了几种方法。
方法1
受this post
的启发,我们可以以向量化的方式解决它。因此,遵循wiki contents
包中的eucl_dist
(免责声明:我是它的作者),我们可以利用matrix-multiplication
和一些NumPy specific implementations
,就像这样-
def elementwise_cdist_v1(a,b):
s_a = np.einsum('ijk,ijk->ij',a,a)
s_b = np.einsum('ijk,ijk->ij',b,b)
return np.sqrt(s_a+s_b-2*np.einsum('ijk,ijk->ij',a,b))
方法2
这是使用np.einsum
并以类似方式实现-
def elementwise_cdist_v2(a,b):
d = a-b
return np.sqrt(np.einsum('ijk,ijk->ij',d,d))
大型数组上的计时-
我们使用的最后一条轴的长度为3的随机数据,这是处理xyz坐标数据时的常见情况,
In [72]: np.random.seed(0)
...: a = np.random.rand(1000,1000,3)
...: b = np.random.rand(1000,1000,3)
In [73]: %timeit elementwise_cdist_v1(a,b)
10 loops, best of 3: 23.9 ms per loop
In [74]: %timeit elementwise_cdist_v2(a,b)
100 loops, best of 3: 13.2 ms per loop
答案 1 :(得分:0)
减少输入数组的维数,并且可以使用。
import numpy as np
from scipy.spatial.distance import cdist
a = np.array([
(0,255,0),
(255,255,0),
(0,255,0),
(0,255,0),
(255,255,0),
(0,255,0),
(0,255,0),
(255,255,0),
(0,255,0),
])
b = np.array([
(255,255,0),
(255,255,0),
(0,255,0),
(255,255,0),
(255,255,0),
(0,255,0),
(255,255,0),
(255,255,0),
(0,255,0),
])
dist_matrix=cdist(a,b)
pair_dist=np.diag(dist_matrix,0)
dist3x3=np.reshape(pair_dist,(3,3))
print("pair_dist\n",pair_dist)
print("dist3x3\n",dist3x3)
输出:
dist_matrix
[[255. 255. 0. 255. 255. 0. 255. 255. 0.]
[ 0. 0. 255. 0. 0. 255. 0. 0. 255.]
[255. 255. 0. 255. 255. 0. 255. 255. 0.]
[255. 255. 0. 255. 255. 0. 255. 255. 0.]
[ 0. 0. 255. 0. 0. 255. 0. 0. 255.]
[255. 255. 0. 255. 255. 0. 255. 255. 0.]
[255. 255. 0. 255. 255. 0. 255. 255. 0.]
[ 0. 0. 255. 0. 0. 255. 0. 0. 255.]
[255. 255. 0. 255. 255. 0. 255. 255. 0.]]
dist
[255. 0. 0. 255. 0. 0. 255. 0. 0.]
dist3x3
[[255. 0. 0.]
[255. 0. 0.]
[255. 0. 0.]]
答案 2 :(得分:0)
首先为欧几里得案例提供一个简单的NumPy解决方案:
>>> np.sqrt(np.sum((a-b)**2, axis=2))
array([[255., 0., 0.],
[255., 0., 0.],
[255., 0., 0.]])
您说您想改用cdist
。请注意,使用cdist
进行逐元素距离计算很浪费,因为此函数计算所有成对元素之间的距离。但是,如果性能不是问题,请尝试以下操作:
>>> np.diag(cdist(a.reshape(-1, 3), b.reshape(-1, 3), 'euclidean')).reshape(-1, 3)
array([[255., 0., 0.],
[255., 0., 0.],
[255., 0., 0.]])
编辑:一种解决方案,其内存需求将随着阵列的大小而更合理地扩展:
>>> np.array([
cdist(x, y, 'euclidean')
for (x, y) in zip(a.reshape(-1, 1, 3), b.reshape(-1, 1, 3))
]).reshape(-1, 3)
array([[255., 0., 0.],
[255., 0., 0.],
[255., 0., 0.]])