Question

得到预先计算的相似度矩阵Sim，其中s_ij等于向量i和向量j之间的相似度。

尝试计算群集。做

  clustering = SpectralClustering(cluster_count, affinity='precomputed', eigen_solver='arpack')
  clustering.fit(sparse_dok_sim_matrix)
  clusters = clustering.fit_predict(sparse_dok_sim_matrix)
  print clusters

我得到的东西看起来像群集标签，但完全是假的。同一簇中样本之间边缘的权重是图上边缘权重的99％。聚类结果似乎是完全随机的，毫无意义。

任何建议，也许我做错了？

例如，我尝试使用dbscan并没有得到任何结果：

results = block_diag(np.ones((3,3)), np.ones((3,3)), np.ones((4,4)))
results = 1000 * (np.ones((len(results), len(results))) - results)
print results
print dbscan(X=results.astype(float), metric='precomputed')

这就是结果，它说一切都是噪音，尽管很明显前三个点在同一个位置，接下来的三个点也是如此......最后四个点也是。

[[    0.     0.     0.  1000.  1000.  1000.  1000.  1000.  1000.  1000.]
 [    0.     0.     0.  1000.  1000.  1000.  1000.  1000.  1000.  1000.]
 [    0.     0.     0.  1000.  1000.  1000.  1000.  1000.  1000.  1000.]
 [ 1000.  1000.  1000.     0.     0.     0.  1000.  1000.  1000.  1000.]
 [ 1000.  1000.  1000.     0.     0.     0.  1000.  1000.  1000.  1000.]
 [ 1000.  1000.  1000.     0.     0.     0.  1000.  1000.  1000.  1000.]
 [ 1000.  1000.  1000.  1000.  1000.  1000.     0.     0.     0.     0.]
 [ 1000.  1000.  1000.  1000.  1000.  1000.     0.     0.     0.     0.]
 [ 1000.  1000.  1000.  1000.  1000.  1000.     0.     0.     0.     0.]
 [ 1000.  1000.  1000.  1000.  1000.  1000.     0.     0.     0.     0.]]
(array([], dtype=int64), array([-1, -1, -1, -1, -1, -1, -1, -1, -1, -1]))

Answer 1

对于DBSCAN：根据documentation，默认为min_samples=5。您的“群集”都没有5个样本，因此所有内容都标记为噪声。对于SpectralClustering，如果没有更多细节，我无法帮助你。

使用scikit学习频谱聚类与预先计算的亲和力矩阵？

1 个答案: