Question

我正在尝试对我无权访问其功能的数据进行聚类。我可以访问数据点之间的距离。距离矩阵主要是块对角线，块外大部分是无穷大。例外是一些有限的噪声点。以下是 plt.matshow 的输出：

我使用 sklearn 的谱聚类实现进行聚类，设置如下：

#This is on the order of 10^6, so scaling is necessary for np.exp to be nonzero
non_inf_max = np.max(distances[distances != inf])
distances_scaled = distances/non_inf_max

from sklearn.cluster import SpectralClustering
result = SpectralClustering(affinity='precomputed', 
                            n_clusters=NUM_BLOCKS,
                            random_state=0).fit(np.exp(-distances_scaled**2))

基本事实是，构成每个块的点应该聚集在一起。由此，我得到的结果非常差，调整后的兰特分数为 0.14。这不应该是光谱聚类的理想情况，还是我有错误的方法？我的缩放是否有问题或缺少标准化步骤？

使用预先计算的亲和力的光谱聚类性能不佳

0 个答案: