Question

我正在运行kmeans聚类以识别标记的数据。我先运行pca，然后运行kmeans，并使用ggbiplot获得以下图：

现在，如何确定表格格式中的哪个点属于哪个组。也就是说，在我的原始数据中，我想用其组标记每个点。

Answer 1

假设数据框的名称为df，并且您想要k个群集。运行k均值函数后...

# K-Means CA
fit <- kmeans(df, k) # where k is the number of clusters

...您必须将根据拟合生成的组包括在数据框中

# add clusters to the dataframe
df$clusters <- fit$cluster
df
             a          b clusters
1  -0.96193342 -0.7447816        1
2  -0.29252572 -1.1312186        1
3   0.25878822 -0.7163585        1
4  -1.15213189  0.2526524        1
5   0.19578283  0.1520457        1
6   0.03012394 -0.3076564        1
7   0.08541773 -0.9530173        1
8   1.11661021 -0.6482428        2
9  -1.21885742  1.2243136        1
10  1.26736872  0.1998116        2

示例中使用的数据

set.seed(3)
n <- 10
k <- 2
df <- data.frame(a= rnorm(n), b= rnorm(n))

您也可以看看here。

使用kmeans（）之后：如何确定哪个点属于哪个组？

1 个答案: