直接从相关矩阵中提取聚类?

时间:2019-09-19 22:43:04

标签: r cluster-analysis k-means

我有一个相关矩阵(不同的变量如何相互关联);并试图根据相关组查找聚类。我目前正在如下可视化数据集,以查看有多少可能的分组。并想知道是否有一种方法可以直接从距离计算中提取聚类,而不是使用k均值,每次运行都会给我不同的结果?

对于以下数据,有两个清晰的群集/可能的组,我想知道是否有任何程序/工具/功能可以将其提取出来?

代码:

library(factoextra)
distance <- get_dist(df)
fviz_dist(distance, gradient = list(low = "#00AFBB", mid = "white", high = "#FC4E07"))

数据框(行名和列名对应于唯一ID)

"","10039","10649","12095","12095x","12095xx","1250","12651","12757","1276","1278x"
"10039",0.255115618609175,-0.177854388857815,-0.0678361007484356,-0.352181285930436,-0.381869743064316,0.0997439892889507,-0.169349645042077,-0.00965202634702178,-0.0510234989805499,-0.0510234989805499
"10649",0.050959625003298,1,-0.0518039876671861,-0.179833539980186,0.050690373507738,0.128702514770023,-0.035435238826486,-0.0201628019873111,0.134771635570939,0.0356960459697166
"12095",-0.0216545036334337,-0.121466109785836,1,-0.1994809609399,-0.260343912678546,0.087392079332695,0.0123907872824176,0.087392079332695,0.087392079332695,-0.0216545036334337
"12095x",-0.098165021937254,-0.0458019488190383,0.00437032844659258,1,-0.113222380655106,-0.0402373870305558,0.0559033755079281,-0.0239049725192886,-0.0641124655537419,-0.0833206680201322
"12095xx",-0.166450101886438,0.0877515444667101,-0.163978399896905,-0.209807655480148,1,-0.15714120804553,-0.236776402855779,-0.210541064944441,-0.181807583843066,-0.19427423003104
"1250",1,0.187857639241878,0.220566284642567,-0.121083419077306,-0.137374140895616,1,0.34356843512983,1,1,1
"12651",0.252133611190047,0.119534413608904,0.24277152335447,0.0682069261602969,-0.130770022311008,0.435253016076324,1,0.630921537603276,0.411525000221144,0.371485268998199
"12757",0.124889049505779,-0.194934097542696,-0.0953559119912251,-0.276086004602751,-0.354148895078468,0.235644433149966,0.0645380123460424,0.38656429814172,0.0820770489758271,0.0820770489758271
"1276",0.45047612755622,0.0686398225015061,0.0952133592954676,-0.253496989469619,-0.270384179141728,1,0.19469614096637,0.45047612755622,1,1
"1278x",1,0.0367380740951311,0.0691680273310695,-0.254489814171706,-0.269208350534652,1,0.194504158280285,1,1,1

1 个答案:

答案 0 :(得分:1)

K均值需要坐标,而不是距离矩阵。因此,无论如何,k均值的结果都非常不可靠。

将您的相关性转换为距离。

然后使用分层聚类,该聚类接受距离矩阵。