I have a similarity matrix that I have calculated between a large number of objects, and each object can have a non-zero similarity with any other object. I generated this matrix for another task, and would now like to cluster it for a new analysis.
It seems like scikit's spectral clustering method could be a good fit, because I can pass in a precomputed affinity matrix. I also know that spectral clustering typically uses some number of nearest neighbors when building the affinity matrix, and my similarity matrix does not have that same constraint.
If I pass in a matrix that allows any number of edges between nodes in the affinity matrix, will scikit limit each node to having only a certain number of nearest neighbors? If not, I guess I will have to make that change to my pre-computed affinity matrix.
答案 0 :(得分:1)
您不必自己计算亲和力来进行一些谱聚类,sklearn会为您做这些。
当您致电sc = SpectralClustering()
时,affinity
参数允许您选择用于计算亲和度矩阵的内核。 rbf
默认情况下似乎是内核,不使用特定数量的最近邻居。但是,如果您决定选择其他内核,则可能需要使用n_neighbours
参数指定该数字。
然后,您可以使用sc.fit_predict(your_matrix)
来计算群集。
答案 1 :(得分:1)
光谱聚类不需要稀疏矩阵。
但如果我没弄错的话,找到稀疏矩阵的最小非零特征向量而不是密集矩阵会更快。最坏的情况可能仍然是O(n ^ 3) - 谱聚类是你能找到的最慢的方法之一。