我正在使用sklearn k-means聚类,我想知道如何计算和存储从我的数据中的每个点到最近的聚类的距离,供以后使用。我的代码:
import numpy as np
import matplotlib.pyplot as plt
import scipy.sparse as sp
from sklearn.metrics.pairwise import euclidean_distances
from datetime import datetime
from sklearn.cluster import KMeans
from sklearn.datasets.samples_generator import make_blobs
def learn(records):
data = [getDataFromTransaction(t) for t in records]
batch_size = 45
X = np.array(data)
centers = [[1, 1, 1], [-1, -1, -1], [1, -1, 1]]
n_clusters = len(centers)
#X, labels_true = make_blobs(n_samples=20, centers=centers,
cluster_std=0.7)
##############################################################################
# Compute clustering with Means
k_means = KMeans(init='k-means++', n_clusters=3, n_init=10)
k_means.fit(X)
k_means_labels = k_means.labels_
k_means_cluster_centers = k_means.cluster_centers_
k_means_labels_unique = np.unique(k_means_labels)
colors = ['#4EACC5', '#FF9C34', '#4E9A06']
plt.figure()
plt.hold(True)
for k, col in zip(range(n_clusters), colors):
my_members = k_means_labels == k
cluster_center = k_means_cluster_centers[k]
plt.plot(X[my_members, 0], X[my_members, 1], 'w',
markerfacecolor=col, marker='.')
plt.plot(cluster_center[0], cluster_center[1], 'o', markerfacecolor=col,
markeredgecolor='k', markersize=6)
plt.title('KMeans')
plt.grid(True)
plt.savefig('./'+str("clustering")+'k_.png')
plt.show(0)
plt.show()
抱歉格式不正确,感谢您提供的任何帮助
答案 0 :(得分:2)
在k-Means中,将点分配给群集,从而最小化与群集中心的平方偏差之和。因此,您所要做的就是采用欧几里德范数,即每个点与在k-Means中分配的集群中心之间的差异。
以下是伪代码:
for i in NumClusters:
dataInCluster = data[clusterLabels[cluster==i].rowNames,]
distance = norm(dataInCluster-clusterCenter[i])
然后,您可以将距离作为数据中的附加列添加。