Question

我正在使用R的内置关联矩阵和层次聚类方法将每日销售数据细分为10个集群。然后，我想通过集群创建聚集的每日销售数据。我已经创建了一个cutree()对象，但是我很难在例如提取聚类号为1的cutree对象中的列名。

为简单起见，我将使用EuStockMarkets数据集并将树剪切成2段;请记住，我在这里处理了数千个列，因此需要可扩展：

data=as.data.frame(EuStockMarkets)

corrMatrix<-cor(data)
dissimilarity<-round(((1-corrMatrix)/2), 3)
distSimilarity<-as.dist(dissimilarity)
hirearchicalCluster<-hclust(distSimilarity)
treecuts<-cutree(hirearchicalCluster, k=2)

现在，我卡住了。我想仅从treecuts中提取列号，例如，簇号等于1。但是，cutree()生成的对象不是DataFrame，因此难以进行子设置。我试图将treecuts转换为数据框，但R不会为行名称创建列，它只是将数字强制转换为名为treecuts的行。

我想做以下操作：

....Code that converts treecuts into a data frame called "treeIDs" with the 
columns "Index" and "Cluster"......

cluster1Columns<-colnames(treeIDs[Cluster==1, ])
cluster1DF<-data[ , (colnames(data) %in% cluster1Columns)]
rowSums(cluster1DF)

......瞧，我已经完成了。

思想/建议？

Answer 1

以下是解决方案：

names(treecuts[which(treecuts[1:4]==1)])
[1] "DAX"  "SMI"  "FTSE"

如果您还想要群集2（或更高版本），则可以使用%in%

names(treecuts[which(treecuts[1:4] %in% c(1,2))])

[1] "DAX"  "SMI"  "CAC"  "FTSE"

Answer 2

为什么不

data$clusterID <- treecuts

然后像往常一样子集数据？

在R中操纵cutree对象以分割原始数据帧

2 个答案: