Question

Here is a list of persons with their scorecards(Z)
set.seed(10)
df <- data.frame(X = sample(c("Male", "Female"), 40, replace = TRUE),Y= sample    (c("Graduate", "Non-graduate"), 40, replace  = TRUE),Z =10*runif(40))
library(dplyr)
df1 <- df %>% group_by(X,Y) %>% arrange(X,Y)
df1

（df1＆gt;其首字母是图像）

在每个小组（女性毕业生，女性非毕业生，男性毕业生，男性非毕业生），我们想要创建集群。最后，我们需要为每个人提供一个唯一的集群ID。这意味着输出文件是一个clusterid数组。

Answer 1

尝试

hc <- hclust(dist(scale(data.matrix(df1))))
plot(hc)
View(newdf <- cbind(df1, cluster=cutree(hc, h = 0.5)))

data.matrix将您的两个因子转换为数字表示，scale给予X，Y和Z相等的权重。cutree通过剪切树形图得到每个观察的聚类高度为0.5。

每组聚类

1 个答案: