Question

我有四个用户之间的相似度矩阵。我想做一个凝聚聚类。代码是这样的：

lena = np.matrix('1 1 0 0;1 1 0 0;0 0 1 0.2;0 0 0.2 1')
X = np.reshape(lena, (-1, 1))

print("Compute structured hierarchical clustering...")
st = time.time()
n_clusters = 3 # number of regionsle


ward = AgglomerativeClustering(n_clusters=n_clusters,
        linkage='complete').fit(X)
print ward
label = np.reshape(ward.labels_, lena.shape)
print("Elapsed time: ", time.time() - st)
print("Number of pixels: ", label.size)
print("Number of clusters: ", np.unique(label).size)
print label

标签的打印结果如下：

[[1 1 0 0]
 [1 1 0 0]
 [0 0 1 2]
 [0 0 2 1]]

这是否意味着它给出了可能的聚类结果列表，我们可以从中选择一个？喜欢选择：[0,0,2,1]。如果是错的，你能告诉我如何基于相似性进行凝聚算法吗？如果它是正确的，相似性矩阵是巨大的，我如何从一个巨大的列表中选择最佳的聚类结果？感谢

Answer 1

我认为这里的问题是你的模型适合错误的数据

# This will return a 4x4 matrix (similarity matrix)
lena = np.matrix('1 1 0 0;1 1 0 0;0 0 1 0.2;0 0 0.2 1')

# However this will return 16x1 matrix
X = np.reshape(lena, (-1, 1))

你得到的真实结果是：

 ward.labels_
 >> array([1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 2, 0, 0, 2, 1])

X向量中每个元素的标签是什么，它不会产生感觉

如果我很好地理解了您的问题，您需要按照它们之间的距离（相似度）对用户进行分类。那么，在这种情况下，我建议用这种方式使用谱聚类：

import numpy as np
from sklearn.cluster import SpectralClustering

lena = np.matrix('1 1 0 0;1 1 0 0;0 0 1 0.2;0 0 0.2 1')

n_clusters = 3
SpectralClustering(n_clusters).fit_predict(lena)

>> array([1, 1, 0, 2], dtype=int32)

sklearn凝聚聚类输入数据

1 个答案: