这是我的数据:
a = (9,5,3)
b = (5,3,6)
c = (1,6,6)
d = (2,5,0)
e = (9,8,3)
f = (7,3,6)
g = (2,15,1)
data = [a,b,c,d,e,f,g]
我有7个数据点,在这里我想获得三个数据(top-k = 3),它可以是(a,b,c或其他点)与其他点/ top-的最大距离k max多样化。
from scipy.spatial import distance
d = distance.euclidean(a,b)
k = 3
i = 1
distancelist = []
max_dist = []
while (i < k):
for x in (data):
for y in (data):
dist = distance.euclidean(x,y)
distancelist.append(dist)
# stuck in here
max_dist = #
i = i+1
print(max_dist)
我卡住了,如何获得最大距离值,然后弹出max_dist
预期产出:
[(9, 8, 3),(2, 15, 1),(5, 3, 6)] #I just choose these as random, I don't know the exact result
例如:
第一子集:总距离18.987490074177131
# combination (a,b,c) or [(9,5,3),(5,3,6),(1,6,6)]
distance.euclidean(data[0], data[1]) + distance.euclidean(data[1], data[2]) + distance.euclidean(data[0], data[2])
第二子集:总距离20.000937912998413
# combination (a,b,d) or [(9,5,3),(5,3,6),(2,5,0)]
distance.euclidean(data[0], data[1]) + distance.euclidean(data[1], data[3]) + distance.euclidean(data[0], data[3])
第二个子集优于第一个子集,因为第二个子集具有更大的总距离值,我想得到子集(top-k = 3),其中最大距离是所有组合的最大值。
答案 0 :(得分:1)
以下情况如何。
首先,将所有距离和点(x,y)放入max_dixdance。这里,所有对都由0.34×0.33+0.33×0.0.1089=0.148137
生成,而不是双循环。
combinations
此代码几乎(不完全)等同于以下内容:
from scipy.spatial import distance
from itertools import combinations
max_dixdance = []
# for x, y in combinations(data, 2):
# dis = distance.euclidean(x, y)
# max_dixdance.append((dis, (x, y)))
## modified version
for xyz in combinations(data, 3):
# print(list(xyz)) # verify all combinations appeared
# calculate a sum of all piarwise distance
dis = 0
for xy in combinations(xyz, 2):
# print(list(xy)) # verify all pairs appeared
dis += distance.euclidean(*xy)
max_dixdance.append((dis, tuple(xyz)))
然后,使用dis值对列表进行排序,并获取前3个元素。
## modified version - 2
for x, y, z in combinations(data, 3):
xyz = (x, y, z)
# calculate a sum of all piarwise distance
dis = 0
for x, y in combinations(xyz, 2):
dis += distance.euclidean(x, y)
max_dixdance.append((dis, xyz))
答案 1 :(得分:1)
没有scipy
使用max
使用键功能的强力:
from itertools import combinations
def dist2(points): # distance of 2 points
return sum((a_ - b_)**2 for a_, b_ in zip(*points))**0.5
def dist3(points): # sum of triangle sides for 3 points
return sum(map(dist2, combinations(points, 2)))
>>> max(combinations(data, 3), key=dist3)
((2, 5, 0), (7, 3, 6), (2, 15, 1))
答案 2 :(得分:0)
这是我对问题的理解,即每点得到前3个距离,即
#`cdist` will give the distance from every point to one another.
mat = scipy.spatial.distance.cdist(data,data, metric='euclidean')
# 0 1 2 3 4 5 6
#0 0.000000 5.385165 8.602325 7.615773 3.000000 4.123106 12.369317
#1 5.385165 0.000000 5.000000 7.000000 7.071068 2.000000 13.341664
#2 8.602325 5.000000 0.000000 6.164414 8.774964 6.708204 10.344080
#3 7.615773 7.000000 6.164414 0.000000 8.185353 8.062258 10.049876
#4 3.000000 7.071068 8.774964 8.185353 0.000000 6.164414 10.099505
#5 4.123106 2.000000 6.708204 8.062258 6.164414 0.000000 13.928388
#6 12.369317 13.341664 10.344080 10.049876 10.099505 13.928388 0.000000
#this is for mapping
di = dict(zip(np.arange(7),list('abcdefg')))
#Get top three distances indices using argsort
max3 = mat.argsort(1)[:,-3:]
#map the indices with the names
max3_with_names = np.array(np.vectorize(di.get)(max3)).tolist()
# [['d', 'c', 'g'],
# ['d', 'e', 'g'],
# ['a', 'e', 'g'],
# ['f', 'e', 'g'],
# ['d', 'c', 'g'],
# ['c', 'd', 'g'],
# ['a', 'b', 'f']]
list(zip(list('abcdefg'),max3_with_names))
# [('a', ['d', 'c', 'g']),# d,c,g is the 3 points maximum distances with respect to a.
# ('b', ['d', 'e', 'g']),
# ('c', ['a', 'e', 'g']),
# ('d', ['f', 'e', 'g']),
# ('e', ['d', 'c', 'g']),
# ('f', ['c', 'd', 'g']),
# ('g', ['a', 'b', 'f'])]