这是确定K均值中簇数的有效方法吗

时间:2018-11-20 19:24:13

标签: r cluster-analysis k-means

我是数据科学的新手,刚刚开始了一条新的道路 在激动人心的旅程中,我研究了EDA中的聚类分析。 在学习它的过程中,我读了很多这样的声明:

  

集群在情人眼中

但是在播放来自Kaggle的一些数据时,我决定运行一个for循环,以比较簇数和总和内的总数平方,如下所示:

  

尝试{1}

ss1<-list()
set.seed(2)
for(i in 2:100){
  cluster<-kmeans(bone[,2:3],centers = i)
  cluster$cluster
  ss1[[i]]<-cluster$tot.withinss
}

bone$clusters<-cluster$cluster

#plotting number of clusters against tot.withinss
ss1<-unlist(ss1)
num_clust<-data.frame(x=1:99,y=ss1)

plot(num_clust,xlab='Number of Clusters',ylab='Total within sum squared error',
     main='Number of Clusters Vs Tot.withinss')
abline(v = 4.5,lty='dashed')

enter image description here

  

尝试{2}

ss2<-list()
set.seed(2)
for(i in 2:100){
  cluster<-kmeans(bone[,2:3],centers = i,nstart = 30)
  cluster$cluster
  ss2[[i]]<-cluster$tot.withinss
}

bone$clusters<-cluster$cluster

#plotting number of clusters against tot.withinss
ss2<-unlist(ss2)
num_clust<-data.frame(x=1:99,y=ss2)

plot(num_clust,xlab='Number of Clusters',ylab='Total within sum squared error',
     main='Number of Clusters Vs Tot.withinss')
abline(v = 4.5,lty='dashed')

enter image description here

  

尝试{3}

ss3<-list()
set.seed(2)
for(i in 2:100){
  cluster<-kmeans(bone[,2:3],centers = i,nstart = 100)
  cluster$cluster
  ss3[[i]]<-cluster$tot.withinss
}

bone$clusters<-cluster$cluster

#plotting number of clusters against tot.withinss
ss3<-unlist(ss3)
num_clust<-data.frame(x=1:99,y=ss3)

plot(num_clust,xlab='Number of Clusters',ylab='Total within sum squared error',
     main='Number of Clusters Vs Tot.withinss')
abline(v = 4.5,lty='dashed')

enter image description here

  

尝试{4}

ss4<-list()
set.seed(2)
for(i in 2:100){
  cluster<-kmeans(bone[,2:3],centers = i,nstart = 200)
  cluster$cluster
  ss4[[i]]<-cluster$tot.withinss
}

bone$clusters<-cluster$cluster

#plotting number of clusters against tot.withinss
ss4<-unlist(ss4)
num_clust<-data.frame(x=1:99,y=ss4)

plot(num_clust,xlab='Number of Clusters',ylab='Total within sum squared error',
     main='Number of Clusters Vs Tot.withinss')
abline(v = 4.5,lty='dashed')

enter image description here

您可以看到: 1-有4个簇,它们的tot.withinss有很大的不同

2-不论随机起始质心的数量如何,这4个簇都是稳定的

结论:

我可以使用这种方法从图上确定聚类的数量,而不是根据求和平方内的总数之差来随机选择K吗?

0 个答案:

没有答案