Question

我想使用data.frame中的变量在phylo对象中设置“ edge.length”。 phylo对象中的“ node.label”“ tip.label”对应于data.frame中的行名。在确保数据正确匹配的同时，如何在data.frame中使用变量设置edge.length？在下面的代码中，它是在第3步中。我希望将edge.length匹配，以便node.label或tip.label匹配data.frame中的row.name。

## R code:
## load ape
library(ape)
## 1. A phylo object:
library(data.tree)

A1  <- Node$new("A1")
B1  <- A1$AddChild("B1")
C1  <- B1$AddChild("C1")
D1  <- C1$AddChild("D1")
E1 <- C1$AddChild("E1")
F1 <- E1$AddChild("F1")
G1 <- E1$AddChild("G1")
H1 <- G1$AddChild("H1")
A1.phylo <- as.phylo.Node(A1)


## 2. A data.frame:
set.seed(1)
df <- as.data.frame(rnorm(7, 5, 3))
names(df) <- "length"
row.names(df) <- c("B1","C1","D1","E1","F1","G1","H1")

## 3. Ad the data to A1.phylo$edge.length
A1.phylo$edge.length <- df$length ## wrong!!!

Answer 1

"phylo"对象中的边缘长度，尖端标签和节点标签按照它们在边缘表中出现的顺序进行处理。因此，您应始终为不同的元素赋予属性，同时确保它们在被赋予属性之前按正确的顺序排列。例如（抱歉，我无法复制您的示例）：

set.seed(1)
## A random tree with 6 edges
test_tree <- rtree(4)

## The edge table
test_tree$edge
#     [,1] [,2]
#[1,]    5    1
#[2,]    5    6
#[3,]    6    2
#[4,]    6    7
#[5,]    7    3
#[6,]    7    4

此处边缘是将节点（数字>4）连接到尖端（数字<5）的所有元素。您可以使用plot可视化它们（及其编号）：

## Visualising all the elements
plot(test_tree, show.tip.label = FALSE)
edgelabels()
nodelabels()
tiplabels()

现在，如果您有一个像这样的数据框：

## A random data frame
df <- as.data.frame(rnorm(6))
names(df) <- "length"
## The edges in the "wrong" order
row.names(df) <- sample(1:6)

您可以使用以下方法正确地为行分配属性：

## Get the order of the edges
test_tree$edge.length <- df$length[sort(rownames(df))]

在这种情况下，排序非常简单，因为df中的边名称是数字，但逻辑是，test_tree$edge.length中的第一个元素应该是将节点5连接到尖端1的边的长度等...

同样，由于您的示例不可复制，因此很难弄清楚出了什么问题，但是我会说您的df$length长度不正确。

使用data.frame中的变量在phylo对象中设置edge.lenth

1 个答案: