Question

我是数据科学的新手，刚刚开始了一条新的道路在激动人心的旅程中，我研究了EDA中的聚类分析。在学习它的过程中，我读了很多这样的声明：

集群在情人眼中

但是在播放来自Kaggle的一些数据时，我决定运行一个for循环，以比较簇数和总和内的总数平方，如下所示：

尝试{1}

ss1<-list()
set.seed(2)
for(i in 2:100){
  cluster<-kmeans(bone[,2:3],centers = i)
  cluster$cluster
  ss1[[i]]<-cluster$tot.withinss
}

bone$clusters<-cluster$cluster

#plotting number of clusters against tot.withinss
ss1<-unlist(ss1)
num_clust<-data.frame(x=1:99,y=ss1)

plot(num_clust,xlab='Number of Clusters',ylab='Total within sum squared error',
     main='Number of Clusters Vs Tot.withinss')
abline(v = 4.5,lty='dashed')

enter image description here

尝试{2}

ss2<-list()
set.seed(2)
for(i in 2:100){
  cluster<-kmeans(bone[,2:3],centers = i,nstart = 30)
  cluster$cluster
  ss2[[i]]<-cluster$tot.withinss
}

bone$clusters<-cluster$cluster

#plotting number of clusters against tot.withinss
ss2<-unlist(ss2)
num_clust<-data.frame(x=1:99,y=ss2)

plot(num_clust,xlab='Number of Clusters',ylab='Total within sum squared error',
     main='Number of Clusters Vs Tot.withinss')
abline(v = 4.5,lty='dashed')

enter image description here

尝试{3}

ss3<-list()
set.seed(2)
for(i in 2:100){
  cluster<-kmeans(bone[,2:3],centers = i,nstart = 100)
  cluster$cluster
  ss3[[i]]<-cluster$tot.withinss
}

bone$clusters<-cluster$cluster

#plotting number of clusters against tot.withinss
ss3<-unlist(ss3)
num_clust<-data.frame(x=1:99,y=ss3)

plot(num_clust,xlab='Number of Clusters',ylab='Total within sum squared error',
     main='Number of Clusters Vs Tot.withinss')
abline(v = 4.5,lty='dashed')

enter image description here

尝试{4}

ss4<-list()
set.seed(2)
for(i in 2:100){
  cluster<-kmeans(bone[,2:3],centers = i,nstart = 200)
  cluster$cluster
  ss4[[i]]<-cluster$tot.withinss
}

bone$clusters<-cluster$cluster

#plotting number of clusters against tot.withinss
ss4<-unlist(ss4)
num_clust<-data.frame(x=1:99,y=ss4)

plot(num_clust,xlab='Number of Clusters',ylab='Total within sum squared error',
     main='Number of Clusters Vs Tot.withinss')
abline(v = 4.5,lty='dashed')

enter image description here

您可以看到： 1-有4个簇，它们的tot.withinss有很大的不同

2-不论随机起始质心的数量如何，这4个簇都是稳定的

结论：

我可以使用这种方法从图上确定聚类的数量，而不是根据求和平方内的总数之差来随机选择K吗？

这是确定K均值中簇数的有效方法吗

0 个答案: