Question

对于下面的代码，我得到'Sepal.Length，Sepal.Width'聚类的输出，但是我也想要哪些数据点属于哪个集群，怎么做呢？

newiris <- iris
> newiris$Species <- NULL

> (kc <- kmeans(newiris, 3)) 
K-means clustering with 3 clusters of sizes 38, 50, 62

Cluster means:
  Sepal.Length Sepal.Width Petal.Length Petal.Width
1     6.850000    3.073684     5.742105    2.071053
2     5.006000    3.428000     1.462000    0.246000
3     5.901613    2.748387     4.393548    1.433871

Clustering vector:
  [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [30] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 1 3 3 3 3 3
 [59] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 3 3 3 3 3 3 3 3 3
 [88] 3 3 3 3 3 3 3 3 3 3 3 3 3 1 3 1 1 1 1 3 1 1 1 1 1 1 3 3 1
[117] 1 1 1 3 1 3 1 3 1 1 3 3 1 1 1 1 1 3 1 1 1 1 3 1 1 1 3 1 1
[146] 1 3 1 1 3

Within cluster sum of squares by cluster:
[1] 23.87947 15.15100 39.82097

Available components:
[1] "cluster"  "centers"  "withinss" "size"   


> table(iris$Species, kc$cluster)

              1  2  3
  setosa      0 50  0
  versicolor  2  0 48
  virginica  36  0 14

> plot(newiris[c("Sepal.Length", "Sepal.Width")], col=kc$cluster)
> points(kc$centers[,c("Sepal.Length", "Sepal.Width")], col=1:3, pch=8, cex=2)

Answer 1

你已经向我们展示了答案。 kc$cluster是每个观察分组到哪个群集。它默认打印，您可以查看str(kc)以查看kmeans函数返回的内容。

str(kc)
## List of 9
##  $ cluster     : int [1:150] 1 3 3 3 1 1 1 1 3 3 ...
##  $ centers     : num [1:3, 1:4] 5.18 6.31 4.74 3.62 2.9 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:3] "1" "2" "3"
##   .. ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
##  $ totss       : num 681
##  $ withinss    : num [1:3] 6.43 118.65 17.67
##  $ tot.withinss: num 143
##  $ betweenss   : num 539
##  $ size        : int [1:3] 33 96 21
##  $ iter        : int 2
##  $ ifault      : int 0
##  - attr(*, "class")= chr "kmeans"

Answer 2

正如Thomas和Mamoun所说，群集信息在kc$cluster中，与原始观察的顺序相同。这可以添加回原始数据集，如下所示：

newiris <- cbind(newiris, cluster = kc$cluster)
head(newiris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width cluster
1          5.1         3.5          1.4         0.2       1
2          4.9         3.0          1.4         0.2       1
3          4.7         3.2          1.3         0.2       1
4          4.6         3.1          1.5         0.2       1
5          5.0         3.6          1.4         0.2       1
6          5.4         3.9          1.7         0.4       1

R中的Kmean聚类：使用数据映射聚类

2 个答案: