一簇NbClust

时间:2018-08-13 16:02:20

标签: r cluster-analysis k-means

我有一个数据集,可以按特定的参数进行拆分,并运行NbClust函数以计算最佳聚类数。偶尔只有一个群集,NbClust会与Error in sample.int(m, k) : cannot take a sample larger than the population when 'replace = FALSE'断开

有通用的解决方法吗?谢谢!! 附有数据:

df1 = structure(c(-0.01400863, -0.01400863, 0.00712136, 0.01377456, 
0.00712136, 0, 0, -0.00636396, 0, 0.00636396), .Dim = c(5L, 2L
))

nb = NbClust(data = df1, diss = NULL, distance = "euclidean",
        min.nc = 2, max.nc = 10, method = "kmeans",alphaBeale = 0)

编辑(8/15/2018)。我找到了解决该问题的方法。它简单地绕过NbClust检查是否nrow(unique(df1)) < 5,因为max.nc必须至少为min.nc + 2

    n_clusters = nrow(unique(df1))
if (nrow(unique(df1)) > 4) {
  nb = NbClust(data = df1, diss = NULL, distance = "euclidean",
               min.nc = 2, max.nc = min(nrow(unique(df1)),10), method = "kmeans",alphaBeale = 0)
  n_clusters = max(unlist(nb[4]))
  print(n_clusters)
}

clusters = kmeans(df1,n_clusters)

0 个答案:

没有答案