Question

我需要一个帮助，知道如何在R中使用k-means群集找到最佳簇数。

我的代码是

library(cluster)
library(factoextra)


#read data
data<-read.csv("..\file.txt",header=FALSE, sep=" ")

#determine number of clusters to use
k.max<- 22
wss <- sapply(2:k.max, function(k){kmeans(data, k, nstart=10 )$tot.withinss})

print(wss)

plot(2:k.max, wss, type="b", pch = 19,  xlab="Number of clusters K", ylab="Total within-clusters sum of squares")


fviz_nbclust(data, kmeans, method = "wss") + geom_vline(xintercept = 3, linetype = 2)

我得到了情节，但我仍然不知道如何找到这个数字？

由于

My plot is in this link to show the rlation between wss and number of clusters with no information about the optimal number of clusters

Answer 1

“肘”没有合理的数学定义（因为在x和y上有不同的刻度，没有角度），而在像你这样的情节中，可能根本没有“肘”。

最有可能的是，k-means对任何k都不起作用。这经常发生。例如，如果您的数据不包含群集。

尝试生成统一数据，并执行相同的绘图 - 它看起来很相似。

Answer 2

n_clust<-fviz_nbclust(df, kmeans, method = "silhouette",k.max = 30)
n_clust<-n_clust$data
max_cluster<-as.numeric(n_clust$clusters[which.max(n_clust$y)])

如何使用fviz_nbclust打印最佳簇数

2 个答案: