我有.csv格式的图表数据的测地距离
我想使用多维缩放(MDS)将其缩小为2D并使用Kmedoids将其聚类
这是我的代码:
# coding: utf-8
import numpy as np
import csv
from sklearn import manifold
from sklearn.metrics.pairwise import pairwise_distances
import kmedoidss
rawdata = csv.reader(open('data.csv', 'r').readlines()[1:])
# Process the data into a 2D array, omitting the header row
data, labels = [], []
for row in rawdata:
labels.append(row[1])
data.append([int(i) for i in row[1:]])
#print data
# Now run very basic MDS
# Documentation here: http://scikit-learn.org/dev/modules/generated/sklearn.manifold.MDS.html#sklearn.manifold.MDS
mds = manifold.MDS(n_components=2, dissimilarity="precomputed")
pos = mds.fit_transform(data)
# distance matrix
D = pairwise_distances(pos, metric='euclidean')
# split into c clusters
M, C = kmedoidss.kMedoids(D, 3)
print ('Data awal : ')
for index, point_idx in enumerate(pos, 1):
print(index, point_idx)
print ('\n medoids:' )
for point_idx in M:
print('{} index ke - {} '.format (pos[point_idx], point_idx+1))
print('')
print('clustering result:')
for label in C:
for point_idx in C[label]:
print('cluster- {}:{} index- {}'.format(label, pos[point_idx], point_idx+1))
kmedoidss.py
import numpy as np
import random
def kMedoids(D, k, tmax=100):
# determine dimensions of distance matrix D
m, n = D.shape
# randomly initialize an array of k medoid indices
M = np.sort(np.random.choice(n, k))
# create a copy of the array of medoid indices
Mnew = np.copy(M)
# initialize a dictionary to represent clusters
C = {}
for t in xrange(tmax):
# determine clusters, i. e. arrays of data indices
J = np.argmin(D[:,M], axis=1)
for kappa in range(k):
C[kappa] = np.where(J==kappa)[0]
# update cluster medoids
for kappa in range(k):
J = np.mean(D[np.ix_(C[kappa],C[kappa])],axis=1)
j = np.argmin(J)
Mnew[kappa] = C[kappa][j]
np.sort(Mnew)
# check for convergence
if np.array_equal(M, Mnew):
break
M = np.copy(Mnew)
else:
# final update of cluster memberships
J = np.argmin(D[:,M], axis=1)
for kappa in range(k):
C[kappa] = np.where(J==kappa)[0]
# return results
return M, C
如何将群集结果可视化为基于群集的不同节点颜色的图形?
答案 0 :(得分:0)
你不需要MDS来运行kMedoid - 只需在原始距离矩阵上运行它(kMedoids也可以通过切换min为最大值来使用相似性矩阵。)
仅将MDS用于绘图。
可视化的常用方法是在群集上使用循环,并以不同的颜色绘制每个群集;或使用颜色谓词。 scipy文档中有很多例子。
http://scikit-learn.org/stable/auto_examples/cluster/plot_cluster_comparison.html
colors = np.array([x for x in 'bgrcmykbgrcmykbgrcmykbgrcmyk']) colors = np.hstack([colors] * 20) y_pred = labels.astype(np.int) plt.scatter(X[:, 0], X[:, 1], color=colors[y_pred].tolist(), s=10)
其中X
是您的pos
变量(2d mds结果),labels
是每个点的整数簇编号。由于您没有将数据包含在标签中,因此您无法获得数据。布局,请考虑使用循环:
for label, pts in C.items():
plt.scatter(pos[pts, 0], pos[pts, 1], color=colors[label])
plt.show()