我有一个数据集,可以按特定的参数进行拆分,并运行NbClust
函数以计算最佳聚类数。偶尔只有一个群集,NbClust
会与Error in sample.int(m, k) : cannot take a sample larger than the population when 'replace = FALSE'
断开
有通用的解决方法吗?谢谢!! 附有数据:
df1 = structure(c(-0.01400863, -0.01400863, 0.00712136, 0.01377456,
0.00712136, 0, 0, -0.00636396, 0, 0.00636396), .Dim = c(5L, 2L
))
nb = NbClust(data = df1, diss = NULL, distance = "euclidean",
min.nc = 2, max.nc = 10, method = "kmeans",alphaBeale = 0)
编辑(8/15/2018)。我找到了解决该问题的方法。它简单地绕过NbClust
检查是否nrow(unique(df1)) < 5
,因为max.nc
必须至少为min.nc + 2
。
n_clusters = nrow(unique(df1))
if (nrow(unique(df1)) > 4) {
nb = NbClust(data = df1, diss = NULL, distance = "euclidean",
min.nc = 2, max.nc = min(nrow(unique(df1)),10), method = "kmeans",alphaBeale = 0)
n_clusters = max(unlist(nb[4]))
print(n_clusters)
}
clusters = kmeans(df1,n_clusters)