到每个星团最远和最近点的距离 - kmeans

时间:2018-04-06 19:50:39

标签: python-3.x scikit-learn k-means

在下图中,我有两个数据集群。对于新的数据点(A),我可以获得从A到最远点的距离"用红色圆圈"对于每个聚类和到最近点的距离"用紫色圆圈"?

简单地说,对于每个群集,我需要距离A"新点#34;到每个群集的最近点和最远点。

Sklearn库是否为此提供了功能,或者我必须手动执行此操作?!

enter image description here

1 个答案:

答案 0 :(得分:2)

你指出的那些实际上并不是最接近和最远的。您在绿色课程中作为关闭圈出的那个,因为您在两个轴上的缩放比例看起来只是闭合。欧几里德距离不会将该点作为关闭点。

除此之外,是的,您需要自己实施。这是一个示例代码:

代码:

import numpy as np
from sklearn.cluster import KMeans

X = np.array([[1, 2], [1, 4], [1, 0],
              [4, 2], [4, 4], [4, 0]])

kmeans = KMeans(n_clusters=2, random_state=0).fit(X)

kmeans.predict([[0, 0], [4, 4]])


from sklearn.metrics.pairwise import euclidean_distances

data = np.array([[5, 0], [-4, 10], [0, 3]])

dists = euclidean_distances(data, X)

for i in range(len(data)):
    print("data: %s" % str(data[i, :]))
    for x in range(kmeans.n_clusters):
        min_dist = min(dists[i, kmeans.labels_ == x])
        max_dist = max(dists[i, kmeans.labels_ == x])
        print("cluster: %d\n\tcloses: %s: %g\n\tfarthest: %s: %g" 
              % (x, 
                 str(X[dists[i, :] == min_dist, :]),
                 min_dist,
                 str(X[dists[i, :] == max_dist, :]),
                 max_dist))

输出:

data: [5 0]
cluster: 0
    closes: [[1 0]]: 4
    farthest: [[1 4]]: 5.65685
cluster: 1
    closes: [[4 0]]: 1
    farthest: [[4 4]]: 4.12311
data: [-4 10]
cluster: 0
    closes: [[1 4]]: 7.81025
    farthest: [[1 0]]: 11.1803
cluster: 1
    closes: [[4 4]]: 10
    farthest: [[4 0]]: 12.8062
data: [0 3]
cluster: 0
    closes: [[1 2]
 [1 4]]: 1.41421
    farthest: [[1 0]]: 3.16228
cluster: 1
    closes: [[4 2]
 [4 4]]: 4.12311
    farthest: [[4 0]]: 5