开放生命之树中物种之间的距离

时间:2016-03-03 00:05:37

标签: r phylogeny

问候和致意,

我试图找到开放生命之树(OTOL)中与其他物种的相对距离。我正在使用phytools R包的fastDist()函数来生成树的分支之间的计数。但是,该函数会在祖先上产生错误。

调试信息:

Error in while (currnode != rt) { : argument is of length zero 
4 getAncestors(tree, sp1) 
3 fastHeight(tree, sp2, sp2) 
2 phytools::fastDist(tree, resolved_names_proper[i], resolved_names_proper[j]) 
1 get_distance(tree, species = c("Abies grandis", "Abies concolor", 
    "Abies lasiocarpa")) 

有问题的代码是:

phytools:::getAncestors = function (tree, node, type = c("all", "parent")) 
{
    if (!inherits(tree, "phylo")) 
        stop("tree should be an object of class \"phylo\".")
    type <- type[1]
    if (type == "all") {
        aa <- vector()
        rt <- length(tree$tip.label) + 1
        currnode <- node
        while (currnode != rt) { #### error here
            currnode <- getAncestors(tree, currnode, "parent")
            aa <- c(aa, currnode)
        }
        return(aa)
    }
    else if (type == "parent") {
        aa <- tree$edge[which(tree$edge[, 2] == node), 1]
        return(aa)
    }
    else stop("do not recognize type")
}

树信息

Phylogenetic tree with 304959 tips and 23328 internal nodes.

Tip labels:
    Leucas_martinicensis_ott9739, Leucas_deflexa_var_deflexa_ott531221, Leonotis_ocymifolia_var_schinzii_ott480842, Leonotis_ocymifolia_var_raineriana_ott480829, Leonotis_nepetifolia_var_africana_ott480834, Leonotis_nepetifolia_var_nepetifolia_ott480833, ...
Node labels:
    Chloroplastida_ott361838, Streptophyta_ott916750, , , , Embryophyta_ott5342313, ...

Unrooted; includes branch lengths.

树是否可能没有指定正确的节点标签? (例如,某些节点标签为空?)例如,tnrs_match_names('Abies lasiocarpa')会返回一个值,但tree$node.labeltree$tip.label找不到任何内容。

当我试图找到同一属(Abies)中分支之间的距离时,会给出导致此错误的具体示例。目前,我使用tryCatch()来继续构建矩阵的过程。但是,获得一些价值会很棒。

MWE:

## Initialize Data

# Any package that is required by the script below is given here:
inst_pkgs = load_pkgs =  c("ape","phytools","R.utils","rotl")
inst_pkgs = inst_pkgs[!(inst_pkgs %in% installed.packages()[,"Package"])]
if(length(inst_pkgs)) install.packages(inst_pkgs)

# Dynamically load packages
pkgs_loaded = lapply(load_pkgs, require, character.only=T)

# Grab the Chloroplastida tree
input_tree = file.path(tempdir(), "chloroplastida.tre.gz")
download.file(url="http://files.opentreeoflife.org/trees/v3subtrees/chloroplastida.tre.gz",destfile=input_tree)
gunzip(input_tree)

input_tree_final = dir(tempdir(), pattern=glob2rx("*.tre"),full.names=T)

# Now read in tree as an phylo object (from ape)
MyTree = read.tree(input_tree_final)

# List of Species
species = c("Abies amabilis","Abies concolor","Abies lasiocarpa")

# Look up "proper" names of species used in tree:
resolved_names = tnrs_match_names(species) # Finds the matching names...


## Output of resolved_names
##     search_string      unique_name approximate_match ott_id is_synonym is_deprecated number_matches
##1   abies amabilis   Abies amabilis             FALSE 876303      FALSE         FALSE              1
##2   abies concolor   Abies concolor             FALSE 876315      FALSE         FALSE              1
##3 abies lasiocarpa Abies lasiocarpa             FALSE  85998      FALSE         FALSE              1


# Make taxa names for querying the tree:
resolved_names_proper = paste(gsub(" ","_",resolved_names$unique_name),"_ott",resolved_names$ott_id,sep="")

## Output of resolved_names_proper
## "Abies_amabilis_ott876303"  "Abies_concolor_ott876315"  "Abies_lasiocarpa_ott85998"

# Single tests between species (can be used so you don't need to pre-calculate all species):
test_distance_ok = fastDist(MyTree,resolved_names_proper[1],resolved_names_proper[2])
test_distance_bad = fastDist(MyTree,resolved_names_proper[1],resolved_names_proper[3])

产生的距离矩阵:

                 Abies amabilis Abies concolor Abies lasiocarpa

Abies amabilis                0              4               NA
Abies concolor                4              0               NA
Abies lasiocarpa             NA             NA                0

修改

使用rotl包构建我收到的树:

resolved_names = tnrs_match_names(species)
tr = tol_induced_subtree(ott_ids=resolved_names$ott_id)

树构建完毕:

# tr
##    
## Phylogenetic tree with 3 tips and 2 internal nodes.
##   
## Tip labels:
## [1] "Abies_lasiocarpa_ott85998" "Abies_amabilis_ott876303"  "Abies_concolor_ott876315" 
##   
## Rooted; no branch lengths. 

但是,我丢失了分支机构的信息。因此出现了一个新错误:

 Error in phytools::fastDist(tree, resolved_names_proper[i], resolved_names_proper[j]) : 
  tree should have edge lengths. 
3 stop("tree should have edge lengths.") 
2 phytools::fastDist(tree, resolved_names_proper[i], resolved_names_proper[j]) 
1 get_distance(tr, species)

我试图直接获得叶绿体树,但API不会返回它。 :

m = tnrs_match_names("chloroplastida")
tree = tol_subtree(ott_id = m$ott_id[1])

有错误信息:

Error in otl_check_error(req) : 
  Message: Requested tree is larger than currently allowed by this service (25000 tips). For larger trees, please download the full tree directly from: http://files.opentreeoflife.org/trees/

因此,直接下载上面的子树。

此外,如果我尝试下载并加载完整的草稿v3或v4树,我会收到:

# Grab the entire tree
input_tree = file.path(tempdir(), "draftversion3.tre.gz")
download.file(url="http://files.opentreeoflife.org/trees/draftversion3.tre.gz",destfile=input_tree)
gunzip(input_tree)

input_tree_final = dir(tempdir(), pattern=glob2rx("*.tre"),full.names=T)

# Now read in tree as an phylo object (from ape)
MyTree = read.tree(input_tree_final)

返回错误消息:

Error in if (sum(obj[[i]]$edge[, 1] == ROOT) == 1 && dim(obj[[i]]$edge)[1] >  : 
  missing value where TRUE/FALSE needed

1 个答案:

答案 0 :(得分:0)

我已经为.tre文件编写了一个解析器,事实上,在JS中编写了draftversion3,我理解了这个问题,但是这个代码对我来说完全陌生,我只会解释解析和添加理论以获得距离...

物种和节点与某些标志区分开来,如OTT00000 ott和数字和括号。

如果我必须用简单的代码做你所要求的,我可以相当容易地做到,通过在括号中向前计数来找到两个括号之间的物种数量,anc向后计数以找到整个树结构包含它们......

简单的任务,粗略地说你只计算(+1和)-1。

一旦你拥有了两个树的整个树,就可以相对容易地在节点上回溯并将两个物种的所有距离加在一起,直到它们的共享树节点。

在代码提供的额外功能中,我不能说功能是什么,但基本逻辑非常简单。希望有助于找到错误。

我使用类似OTT0000类型标记的符号来区分物种和节点。