我有n
个观察点,我已经计算了m
个聚类。我生成的聚类实际上是分层分裂,即使它们是独立计算的。这是我数据的一个子集:
print(test)
m_0 m_13000 m_14608 m_16278
<dbl> <dbl> <dbl> <dbl>
1 1 10 101 1001
2 1 10 101 1002
3 1 11 102 1003
4 1 11 102 1004
5 1 12 103 1005
6 1 12 104 1006
7 2 13 105 1007
8 2 13 106 1008
9 2 13 106 1009
10 2 14 107 1010
.. ... ... ... ...
每行i = 1:n
都是观察,每列j = 1:m
是基于群集j
的观察的成员资格。群集ID在不同的群集解决方案中是唯一的,即min(test[, j]) > max(test[, j-1])
。
观察结果表示为igraph
图上的顶点。 我想将上面的test
数据转换为合并矩阵,以便传递给igraph::make_clusters
进行进一步操作。做这个的最好方式是什么?我看了this example创建的合并矩阵,但我真的不明白。任何人都可以帮助我吗?
答案 0 :(得分:1)
我的解决方案最终是使用the answer to a related SO question about dendrograms的修改版本将数据帧转换为Newick Tree字符串,然后使用phylo
将结果字符串读取到phytools::read.newick
对象中,我可以使用hclust
转换为ape::as.hclust
对象(如有必要)。还不错!
(稍加编辑)解决方案
注意:这些功能似乎与tibbles
不相称,因此请使用标准data.frames
代替。
df2newick <- function(df, innerlabel = FALSE){
traverse <- function(a, i, innerl){
if(i < (ncol(df))){
alevelinner <- as.character(
unique(df[which(as.character(df[,i]) == a), i + 1])
)
desc <- NULL
for(b in alevelinner)
desc <- c(desc, traverse(b, i + 1, innerl))
il <- NULL
if(innerl==TRUE)
il <- paste0(",", a)
(newickout <- paste("(", paste(desc,collapse = ","), ")", il,
sep=""))
}
else {
(newickout <- a)
}
}
alevel <- as.character(unique(df[,1]))
newick <- NULL
for(x in alevel)
newick <- c(newick, traverse(x, 1, innerlabel))
(newick <- paste("(", paste(newick, collapse = ","), ");", sep=""))
}
可重复的示例
ex = structure(list(level.1 = c("1", "1", "1", "1", "1", "1", "1",
"1", "1", "1", "1", "1", "1"), level.2 = c("883", "883", "883",
"883", "883", "883", "883", "883", "1758", "883", "883", "883",
"883"), level.3 = c("2293", "2293", "2293", "2293", "2293", "2293",
"2293", "2293", "3240", "2293", "2293", "2293", "2293"), level.4 = c("3932",
"3932", "3932", "3932", "3932", "3932", "3932", "3932", "5139",
"5777", "3932", "3932", "3932"), level.5 = c("6056", "6056",
"6056", "6056", "6056", "6056", "6056", "6056", "7472", "8110",
"6056", "6056", "6056"), level.6 = c("8456", "8545", "8949",
"8456", "8545", "8456", "8545", "8545", "10385", "11023", "8545",
"8545", "8545"), level.7 = c("11525", "11635", "12084", "12297",
"12339", "12297", "12339", "12339", "13632", "14270", "12339",
"12339", "12339"), name = c("A", "B", "C", "D", "E", "F", "G",
"H", "I", "J", "K", "L", "M")), class = "data.frame", .Names = c("level.1",
"level.2", "level.3", "level.4", "level.5", "level.6", "level.7",
"name"), row.names = c(NA, -13L))
treestring = df2newick(ex, innerlabel = FALSE)
library(phytools)
extree = collapse.singles(read.newick(text = treestring))
extree$node.label = head(names(ex), -1)
plot(extree, show.node.label = TRUE)
答案 1 :(得分:1)
另一种(非常简单的)解决方案是使用data.tree
包。
library(data.tree)
tree = as.Node(ex)
library(ape)
ph = as.phylo(tree)
as.hclust(ph)
但是,请注意,您需要某种方式来定义分支长度才能转换为hclust
对象。同样的约束适用于我的另一个答案。