用jaccard系数构造相似矩阵的谱聚类

时间:2015-06-10 07:42:37

标签: machine-learning cluster-analysis pca eigenvalue eigenvector

我有一个分类数据集,我正在对它进行谱聚类。但是我没有得到很好的输出。我选择对应于最大特征值的特征向量作为k-means的质心。

请在下面找到我遵循的流程:

1. Create a symmetric similarity matrix (m*m) using jaccard coefficient.
   For example, for a data set,
   a,b,c,d
   a,b,x,y
   The similarity matrix I compute would look like :
   |1       0.33|
   |0.33     1  |
2. Compute the first k eigen vectors corresponding to largest eigen values. where k is the number of cluster.
3. Normalize the symmetric similarity matrix
4. perform the clustering on the normalized similarity matrix using eigen vectors as initial centroids for k-means.

我的问题是:

Is computing Jaccard similarity matrix the right choice for spectral clustering.

Is it the right way of selecting eigen vectors as cluster centroids for spectal clustering because I dont see other options for categorical dataset.

Is there anything wrong with the procedure I follow.

1 个答案:

答案 0 :(得分:1)

据我所知,你已经混合并改组了许多方法。难怪它不起作用......

  1. 你可以简单地使用jaccard距离(jaccard相似度的简单反转)+层次聚类
  2. 你可以用MDS来投射你的数据,然后是k-means(可能是你想要做的)
  3. 亲和力传播等值得一试