我有两个numpy
数组如下。
X = np.array([-0.34095692,-0.34044722,-0.27155318,-0.21320583,-0.44657865,-0.19587836, -0.29414279, -0.3948753 ,-0.21655774 , -0.34857087])
Y = np.array([0.16305762,0.38554548, 0.10412536, -0.57981103, 0.17927523, -0.22612216, -0.34569697, 0.30463137,0.01301744,-0.42661108])
这些是10个用户的x和y协调。我需要找到每个用户之间的相似性。 例如:
x1 = -0.34095692
y1 = 0.16305762
x2 = -0.34044722
y2 = 0.38554548
Euclidean distance = (|x1-y1|^2 + |x2-y2|^2)^1/2
所以最终我想得到一个如下矩阵:帮助我实现这个目标。
答案 0 :(得分:2)
完成工作的简短代码段:
A = (X-Y)**2
p, q = np.meshgrid(np.arange(10), np.arange(10))
np.sqrt(A[p]-A[q])
修改:说明
A
只是一个预先计算的向量,包含所有平方差异。np.meshgrid
:此函数的目的是生成两个不同数组中的所有值对。这不是最好的解决方案,因为您将得到整个矩阵,但对于您拥有的样本数量来说,这并不是什么大问题。生成的值将对应于A
。A[p]
也是一种魔力。自己尝试一下,了解它的行为。nan
,但这就是你要求的。真正的欧几里德距离为+
,而不是-
。 array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3, 3, 3, 3, 3],
[4, 4, 4, 4, 4, 4, 4, 4, 4, 4],
[5, 5, 5, 5, 5, 5, 5, 5, 5, 5],
[6, 6, 6, 6, 6, 6, 6, 6, 6, 6],
[7, 7, 7, 7, 7, 7, 7, 7, 7, 7],
[8, 8, 8, 8, 8, 8, 8, 8, 8, 8],
[9, 9, 9, 9, 9, 9, 9, 9, 9, 9]])
答案 1 :(得分:2)
使用zip(X, Y)
获取坐标对,如果您想获得点之间的欧几里德距离,则应为(|x1-x2|^2+|y1-y2|^2)^0.5
,而不是(|x1-y1|^2 - |x2-y2|^2)^1/2
:
In [125]: coords=zip(X, Y)
In [126]: from scipy import spatial
...: dists=spatial.distance.cdist(coords, coords)
In [127]: dists
Out[127]:
array([[ 0. , 0.22248844, 0.09104884, 0.75377329, 0.10685954,
0.41534165, 0.5109039 , 0.15149362, 0.19490308, 0.58971785],
[ 0.22248844, 0. , 0.28973034, 0.9737061 , 0.23197262,
0.62852005, 0.73270705, 0.09751671, 0.39258852, 0.81219719],
[ 0.09104884, 0.28973034, 0. , 0.68642072, 0.19047682,
0.33880688, 0.45038919, 0.23539542, 0.1064197 , 0.53629553],
[ 0.75377329, 0.9737061 , 0.68642072, 0. , 0.79415038,
0.35411306, 0.24770988, 0.90290761, 0.59283795, 0.20443561],
[ 0.10685954, 0.23197262, 0.19047682, 0.79415038, 0. ,
0.47665258, 0.54665574, 0.13560014, 0.28381556, 0.61376196],
[ 0.41534165, 0.62852005, 0.33880688, 0.35411306, 0.47665258,
0. , 0.15477091, 0.56683251, 0.24003205, 0.25201351],
[ 0.5109039 , 0.73270705, 0.45038919, 0.24770988, 0.54665574,
0.15477091, 0. , 0.65808357, 0.36700881, 0.09751671],
[ 0.15149362, 0.09751671, 0.23539542, 0.90290761, 0.13560014,
0.56683251, 0.65808357, 0. , 0.34181257, 0.73270705],
[ 0.19490308, 0.39258852, 0.1064197 , 0.59283795, 0.28381556,
0.24003205, 0.36700881, 0.34181257, 0. , 0.45902146],
[ 0.58971785, 0.81219719, 0.53629553, 0.20443561, 0.61376196,
0.25201351, 0.09751671, 0.73270705, 0.45902146, 0. ]])
要获取此数组的上三角形,请使用numpy.triu
:
In [128]: np.triu(dists)
Out[128]:
array([[ 0. , 0.22248844, 0.09104884, 0.75377329, 0.10685954,
0.41534165, 0.5109039 , 0.15149362, 0.19490308, 0.58971785],
[ 0. , 0. , 0.28973034, 0.9737061 , 0.23197262,
0.62852005, 0.73270705, 0.09751671, 0.39258852, 0.81219719],
[ 0. , 0. , 0. , 0.68642072, 0.19047682,
0.33880688, 0.45038919, 0.23539542, 0.1064197 , 0.53629553],
[ 0. , 0. , 0. , 0. , 0.79415038,
0.35411306, 0.24770988, 0.90290761, 0.59283795, 0.20443561],
[ 0. , 0. , 0. , 0. , 0. ,
0.47665258, 0.54665574, 0.13560014, 0.28381556, 0.61376196],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0.15477091, 0.56683251, 0.24003205, 0.25201351],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0.65808357, 0.36700881, 0.09751671],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0.34181257, 0.73270705],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0.45902146],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ]])