在下图中,我有两个数据集群。对于新的数据点(A),我可以获得从A到最远点的距离"用红色圆圈"对于每个聚类和到最近点的距离"用紫色圆圈"?
简单地说,对于每个群集,我需要距离A"新点#34;到每个群集的最近点和最远点。
Sklearn库是否为此提供了功能,或者我必须手动执行此操作?!
答案 0 :(得分:2)
你指出的那些实际上并不是最接近和最远的。您在绿色课程中作为关闭圈出的那个,因为您在两个轴上的缩放比例看起来只是闭合。欧几里德距离不会将该点作为关闭点。
除此之外,是的,您需要自己实施。这是一个示例代码:
代码:
import numpy as np
from sklearn.cluster import KMeans
X = np.array([[1, 2], [1, 4], [1, 0],
[4, 2], [4, 4], [4, 0]])
kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
kmeans.predict([[0, 0], [4, 4]])
from sklearn.metrics.pairwise import euclidean_distances
data = np.array([[5, 0], [-4, 10], [0, 3]])
dists = euclidean_distances(data, X)
for i in range(len(data)):
print("data: %s" % str(data[i, :]))
for x in range(kmeans.n_clusters):
min_dist = min(dists[i, kmeans.labels_ == x])
max_dist = max(dists[i, kmeans.labels_ == x])
print("cluster: %d\n\tcloses: %s: %g\n\tfarthest: %s: %g"
% (x,
str(X[dists[i, :] == min_dist, :]),
min_dist,
str(X[dists[i, :] == max_dist, :]),
max_dist))
输出:
data: [5 0]
cluster: 0
closes: [[1 0]]: 4
farthest: [[1 4]]: 5.65685
cluster: 1
closes: [[4 0]]: 1
farthest: [[4 4]]: 4.12311
data: [-4 10]
cluster: 0
closes: [[1 4]]: 7.81025
farthest: [[1 0]]: 11.1803
cluster: 1
closes: [[4 4]]: 10
farthest: [[4 0]]: 12.8062
data: [0 3]
cluster: 0
closes: [[1 2]
[1 4]]: 1.41421
farthest: [[1 0]]: 3.16228
cluster: 1
closes: [[4 2]
[4 4]]: 4.12311
farthest: [[4 0]]: 5