我正在尝试使用名为dendextend的令人敬畏的R-package来绘制树状图并为其分支和颜色着色。根据一组先前定义的组标签。 我已经在Stack Overflow中看到了你的答案,以及dendextend小插图的常见问题解答,但我仍然不确定如何实现我的目标。
我们假设我有一个数据框,第一列包含用于聚类的个体名称,然后是几列包含要分析的因子,最后一列包含每个人的组信息(参见下表)。
individual 282856 282960 283275 283503 283572 283614 284015 group
pat15612 0 0 0 0 0 0 0 g2
pat38736 0 0 0 0 0 0 0 g2
pat38740 0 0 0 0 0 1 0 g2
pat38742 0 0 0 0 0 1 0 g4
pat38743 0 0 1 0 0 1 0 g3
pat38745 0 0 1 0 1 0 0 g4
pat38750 0 0 0 1 0 1 0 g4
pat38753 0 0 0 1 0 0 0 g3
pat40120 0 0 0 0 1 0 0 g4
pat40124 0 0 0 0 1 0 0 g4
pat40125 0 0 0 0 1 1 0 g4
pat40126 0 0 0 1 0 0 0 g4
pat40137 1 0 0 0 0 0 0 g4
pat40142 0 1 0 0 0 0 0 g5
pat46903 0 0 0 0 0 1 0 g1
pat67612 1 0 0 0 1 0 0 g1
pat67621 0 0 0 0 0 0 0 g2
pat67630 0 0 1 0 0 0 0 g2
pat67634 0 0 0 0 0 0 0 g5
pat67657 0 1 0 1 0 0 0 g5
pat67680 0 0 0 0 0 1 0 g5
pat67683 0 0 1 1 0 0 0 g6
如何根据他们所属的组对代表每个人的分支和标签进行着色,即使他们可能聚集在不同的区域中?
如果可以实现这一点,有没有办法定义分配给每个组的颜色?
答案 0 :(得分:2)
我很高兴你自己解决了这个问题。
更简单的解决方案是在order_value = TRUE
函数中使用set
参数。例如:
library(dendextend)
iris2 <- iris[,-5]
rownames(iris2) <- paste(iris[,5],iris[,5],iris[,5], rownames(iris2))
dend <- iris2 %>% dist %>% hclust %>% as.dendrogram
dend <- dend %>% set("labels_colors", as.numeric(iris[,5]), order_value = TRUE) %>%
set("labels_cex", .5)
par(mar = c(4,1,0,8))
plot(dend, horiz = T)
将导致(如您所见,标签的颜色基于虹膜数据集中的其他变量“Species”):
(p.s:我将物种出现的次数增加了三倍,以便更容易看出颜色与标签长度的关系)
答案 1 :(得分:1)
我能够使用另一个名为&#34; sparcl&#34;的软件包来完成它。我是根据上一篇文章(How to colour the labels of a dendrogram by an additional factor variable in R)做到的。
这是我的代码:
#load the dataset.....
#calculate distances
d <- dist(dataset2, method="Jaccard")
## Hierarchical cluster the data
hc <- hclust(d)
dend <- as.dendrogram(hc)
#create labels
labs=dataset$individual
#format to dendrogram
hcd = as.dendrogram(hc)
plot(hcd, cex=0.6)
# factor variable for colours
Var = dataset$group
# convert numbers to colours
varCol = gsub("g1.*","green",Var)
varCol = gsub("g2.*","gold",varCol)
varCol = gsub("g3.*","pink",varCol)
varCol = gsub("g4.*","purple",varCol)
varCol = gsub("g5.*","blue",varCol)
varCol = gsub("g6.*","red",varCol)
#colour-code dendrogram branches by a factor
library(sparcl)
ColorDendrogram(hc, y=varCol, branchlength=0.9, labels=labs,
xlab="", ylab="", sub="")
最后,我设法推断了一个&#34; dendextend&#34;基于此帖子示例的包解决方案(How to colour the labels of a dendrogram by an additional factor variable in R):
# install.packages("dendextend")
library(dendextend)
#load the dataset.....
dataset2<-dataset[,1:7]#same dataset as in the example
#calculate the dendrogram
dend <- as.dendrogram(hclust(dist(dataset2)))
#capture the colors from the "group" column
colors_to_use <- as.numeric(dataset$group)
colors_to_use
# sort the colors based on their order in dend:
colors_to_use <- colors_to_use[order.dendrogram(dend)]
colors_to_use
#Apply colors
labels_colors(dend) <- colors_to_use
# Patient labels have a color based on their group
labels_colors(dend)
plot(dend, main = "Color in labels")