我正在使用mclust::Mclust()
函数对一个小的数据集进行聚类。但是,我在为每个要放入数据集中的数据提取聚类分类而苦苦挣扎。
以下是数据:
df <- structure(list(latitud = c(-43.8189010620117, -34.2731018066406,
-47.0666999816895, -35.7543983459473, -47.1413993835449, -36.6260986328125,
-37.2118988037109, -33.3086013793945, -37.2792015075684, -35.4524993896484,
-36.5856018066406, -44.6591987609863, -28.6996994018555, -48.1591987609863,
-45.4000015258789, -29.94580078125, -30.4386005401611, -31.6646995544434,
-51.2000007629395, -51.3328018188477, -51.25, -45.551700592041,
-39.0144004821777, -38.6081008911133, -34.9844017028809, -32.8403015136719,
-29.9953002929688, -18.3999996185303, -35.6169013977051, -35.9085998535156,
-35.4068984985352, -32.7571983337402, -32.8502998352051, -33.5938987731934,
-38.4303016662598, -38.6866989135742, -45.4057998657227, -37.5503005981445,
-37.8997001647949, -38.0368995666504, -37.7047004699707, -37.7963981628418,
-37.7092018127441, -31.5835990905762, -30.9242000579834, -38.2008018493652,
-31.6881008148193, -31.8117008209229, -27.9747009277344, -30.7047004699707,
-36.6500015258789, -34.4921989440918, -34.6581001281738, -47.3499984741211,
-47.5, -33.7219009399414, -33.6613998413086, -35.5574989318848
), longitud = c(-72.38330078125, -71.371696472168, -72.8000030517578,
-71.0864028930664, -72.7257995605469, -72.4891967773438, -72.3242034912109,
-70.3572006225586, -71.9847030639648, -71.7332992553711, -71.5255966186523,
-71.8082962036133, -70.5500030517578, -73.0888977050781, -72.5999984741211,
-70.5327987670898, -71.002197265625, -71.2546997070312, -72.9332962036133,
-73.1091995239258, -72.5167007446289, -72.0680999755859, -73.0828018188477,
-72.8478012084961, -72.0100021362305, -71.0255966186523, -70.5867004394531,
-70.3000030517578, -71.7677993774414, -71.2981033325195, -72.2082977294922,
-70.736701965332, -70.5093994140625, -70.3792037963867, -72.0105972290039,
-72.502799987793, -72.6231002807617, -72.5903015136719, -71.6239013671875,
-71.4781036376953, -71.7683029174805, -71.6988983154297, -71.823600769043,
-71.4606018066406, -70.7731018066406, -71.2988967895508, -71.2658004760742,
-70.9302978515625, -69.997802734375, -70.9244003295898, -72.4499969482422,
-71.3731002807617, -71.3019027709961, -72.8499984741211, -72.9749984741211,
-71.5550003051758, -71.3371963500977, -71.7067031860352)), row.names = c(NA,
-58L), class = c("tbl_df", "tbl", "data.frame"))
集群:
d_clust <- Mclust(df)
现在,当我运行plot(d_clust)
时,它会显示所有图形和所有内容。但这并没有告诉我哪个集群对应于每一行。我研究了文档和其他文档(1,2,3)以及与Mclust()
(1,{{3} })无法解决我的问题。
我正在寻找这样的东西:
| latitud | longitud | cluster_id |
顺便说一句,我做class(d_clust)
时是Mclust
类。如果仅运行d_clust
却无法提供要绘制的表/数据框,如何绘制d_clust
?
答案 0 :(得分:0)
当您运行Mclust时,它将尝试使用不同的模型和不同的G值(簇数)。因此,请查看BIC图:
因为Mclust将仅基于BIC选择最佳模型,并将其保留为d_clust $ modelName和d_clus $ G。
一旦您知道使用哪种模型(我认为您的情况是EVE和G = 4),就可以进行分类了,您可以使用以下方法将其删除:
d_clust$classification
# or
results = data.frame(df,cluster=d_clust$classification)
head(results)
latitud longitud cluster
1 -43.8189 -72.3833 1
2 -34.2731 -71.3717 2
3 -47.0667 -72.8000 1
4 -35.7544 -71.0864 3
5 -47.1414 -72.7258 1
6 -36.6261 -72.4892 3
您还可以绘制:
with(results,plot(latitud,longitud,col=factor(cluster)))
例如,您可以考虑是否应该使用聚类,而应该使用G = 4。