我在R中有一个24行和10,000列的数字矩阵。该矩阵的行名基本上是文件名,我从中读取了对应于24行中每一行的数据。除此之外,我有一个单独的因子列表,包含24个entires,指定24个文件所属的组。有3组 - 醇类,碳氢化合物和酯类。它们所属的名称和相应的组如下所示:
function init(options) {
var viewModel = {
forename: ko.observable(options.forename),
surname: ko.observable(options.surname),
email: ko.observable(options.email)
}
viewModel.getFullName = ko.computed(function () {
return options.forename + ' ' + options.surname;
});
return viewModel;
}
我想生成一个树形图来查看矩阵中的数据是如何聚类的。所以,我使用了以下命令:
> MS.mz
[1] "int-354.19" "int-361.35" "int-368.35" "int-396.38" "int-408.41" "int-410.43" "int-422.43"
[8] "int-424.42" "int-436.44" "int-438.46" "int-452.00" "int-480.48" "int-648.64" "int-312.14"
[15] "int-676.68" "int-690.62" "int-704.75" "int-312.29" "int-326.09" "int-326.18" "int-326.31"
[22] "int-340.21" "int-340.32" "int-352.35"
> MS.groups
[1] Alcohol Alcohol Alcohol Alcohol Hydrocarbon Alcohol Hydrocarbon Alcohol
[9] Hydrocarbon Alcohol Alcohol Alcohol Ester Alcohol Ester Ester
[17] Ester Alcohol Alcohol Alcohol Alcohol Alcohol Alcohol Hydrocarbon
Levels: Alcohol Ester Hydrocarbon
我得到了一个树状图。现在我想根据它们所属的组(即酒精,碳氢化合物或酯类)为树形图中的文件名着色。我查看了论坛上发布的不同示例,如
Label and color leaf dendrogram in r
Label and color leaf dendrogram in R using ape package
,但无法为我的数据实现它。我不确定如何将row.names与MS.groups相关联以获得树形图中的彩色名称。
使用dendextend生成树(如https://nycdatascience.com/wp-content/uploads/2013/09/dendextend-tutorial.pdf中所述),我得到以下树
以下是用于生成它的代码:
require(vegan)
dist.mat<-vegdist(MS.data.scaled.transposed,method="euclidean")
clust.res<-hclust(dist.mat)
plot(clust.res)
答案 0 :(得分:8)
我怀疑您要查找的功能是color_labels
还是get_leaves_branches_col
。标签的第一种颜色基于cutree
(如color_branches
),第二种颜色允许您获取每个叶子的分支颜色,然后使用它来为树的标签着色(如果你使用不寻常的方法为分支着色(就像使用branches_attr_by_labels
时那样)。例如:
# define dendrogram object to play with:
hc <- hclust(dist(USArrests[1:5,]), "ave")
dend <- as.dendrogram(hc)
library(dendextend)
par(mfrow = c(1,2), mar = c(5,2,1,0))
dend <- dend %>%
color_branches(k = 3) %>%
set("branches_lwd", c(2,1,2)) %>%
set("branches_lty", c(1,2,1))
plot(dend)
dend <- color_labels(dend, k = 3)
# The same as:
# labels_colors(dend) <- get_leaves_branches_col(dend)
plot(dend)
无论哪种方式,你都应该看一下set
函数,了解你的树形图可以做些什么(这样就省去了记住所有不同函数名称的麻烦)。
答案 1 :(得分:1)
您可以查看本教程,该教程显示了几种用于按组显示R中树状图的方法
https://rstudio-pubs-static.s3.amazonaws.com/1876_df0bf890dd54461f98719b461d987c3d.html
但是,我认为最适合您数据的解决方案是由'dendextend'软件包提供的。请参阅教程(有关'iris'数据集的示例,与您的问题类似):https://nycdatascience.com/wp-content/uploads/2013/09/dendextend-tutorial.pdf
另见插图:http://cran.r-project.org/web/packages/dendextend/vignettes/Cluster_Analysis.html
答案 2 :(得分:0)
你可以尝试这个解决方案,只用'MS.groups'和'var'改变'labs',你的'MS.groups'转换为数字(也许,用as.numeric)。 它来自How to colour the labels of a dendrogram by an additional factor variable in R
## The data
df <- structure(list(labs = c("a1", "a2", "a3", "a4", "a5", "a6", "a7",
"a8", "b1", "b2", "b3", "b4", "b5", "b6", "b7"), var = c(1L, 1L, 2L,
1L,2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L), td = c(13.1, 14.5, 16.7,
12.9, 14.9, 15.6, 13.4, 15.3, 12.8, 14.5, 14.7, 13.1, 14.9, 15.6, 14.6),
fd = c(2L, 3L, 3L, 1L, 2L, 3L, 2L, 3L, 2L, 4L, 2L, 1L, 4L, 3L, 3L)),
.Names = c("labs", "var", "td", "fd"), class = "data.frame", row.names =
c(NA, -15L))
## Subset for clustering
df.nw = df[,3:4]
# Assign the labs column to a vector
labs = df$labs
d = dist(as.matrix(df.nw)) # find distance matrix
hc = hclust(d, method="complete") # apply hierarchical clustering
## plot the dendrogram
plot(hc, hang=-0.01, cex=0.6, labels=labs, xlab="")
## convert hclust to dendrogram
hcd = as.dendrogram(hc)
## plot using dendrogram object
plot(hcd, cex=0.6)
Var = df$var # factor variable for colours
varCol = gsub("1","red",Var) # convert numbers to colours
varCol = gsub("2","blue",varCol)
# colour-code dendrogram branches by a factor
# ... your code
colLab <- function(n) {
if(is.leaf(n)) {
a <- attributes(n)
attr(n, "label") <- labs[a$label]
attr(n, "nodePar") <- c(a$nodePar, lab.col = varCol[a$label])
}
n
}
## Coloured plot
plot(dendrapply(hcd, colLab))