Question

我想确定群集的k数，但我无法使用NbClust函数，因为我的数据集太大了。

我发现了一篇关于K-Means聚类的文章http://www.r-bloggers.com/k-means-clustering-from-r-in-action/。我试图在文章中运行该函数，但我收到以下错误消息。

有没有人有NbClust功能或文章中所述功能的解决方案？

> wssplot <- function(m, nc=15, seed=1234){
+   wss <- (nrow(data)-1)*sum(apply(data,2,var))
+   for (i in 2:nc){
+     set.seed(seed)
+     wss[i] <- sum(kmeans(data, centers=i)$withinss)}
+   plot(1:nc, wss, type="b", xlab="Number of Clusters",
+        ylab="Within groups sum of squares")}

> wssplot(m3)
 Error in apply(data, 2, var) : dim(X) must have a positive length


> nc <- NbClust(m3, min.nc=2, max.nc=20, method="kmeans")
Error: cannot allocate vector of size 447.3 Mb
In addition: Warning messages:
1: In is.factor(x) :
  Reached total allocation of 8139Mb: see help(memory.size)
2: In is.factor(x) :

Answer 1

你试过吗

使用wssplot
考虑使用抽样来使用较小的数据集
手动执行此过程：首先使用k = 2，然后k = 3的群集，然后比较结果的分数？

首先确保你的数据适合k-means：你能计算出来吗？

在确定K-Means聚类的簇数的k个数时获得的错误消息

1 个答案: