基于R中部分匹配的分类单元标签的系统发育树中的折叠分支

时间:2019-07-08 14:18:24

标签: r tree collapse phylogeny

我为DNA细菌区域建立了系统发育树,在该区域中,通常相同的细菌物种聚集在紧密的分支中。 现在,我想折叠具有共同标签的分支。 我尝试根据以下与终端分类单元名称部分匹配的关键字来定义要折叠的标签:

关键字:

("vulneris","ulcerans","blattae","coli","hermannii","albertii","periodonticum","fergusonii")

在R中,我上传了以下文件。newick:

(((((((((E_vulneris_otu44:0.03924,((E_vulneris_otu97:0.00766,
E_vulneris_otu96:0)0.8:0.00914,E_fergusonii_otu74:0.00725)0:0.0072)0:0,
((E_vulneris_otu95:0,
(((gi_undefined_HMPREF0402_04011_HMPREF0402_04011_E_ulcerans:0,
fig_768594rna24_RO08_01535_E_vulneris:0)0:0.00373,
(gi_undefined_HMPREF1766_00665_HMPREF1766_00665_E_vulneris:0,
fig_768595rna53_CBG60_05850_E_vulneris:0)0:0.00373)0.8:0.00701,
fig_7685910rna43_CI114_11510_E_vulneris:0)0.84:0.00717)0:0,
E_fergusonii_otu78:0.0072)0.85:0.00718)0:0,E_vulneris_otu94:0)0.82:0.00753,
E_vulneris_otu77:0)0.82:0.00698,(E_vulneris_otu93:0,((E_vulneris_otu89:0,
E_vulneris_otu90:0.00754)0:0.00765,E_vulneris_otu91:0)0.83:0.01608)0:0)
0.8:0.02319,(((E_vulneris_otu35:0,E_vulneris_otu34:0.00752)0.83:0.00766,
E_vulneris_otu28:0.00688)0:0,(E_vulneris_otu2:0.01715,E_vulneris_otu1:0)
0.89:0.01482)0.8:0.01541)0.89:0.02013,E_periodonticum_otu73:0)0.75:0.01535,
fig_86016rna55_CTM98_06410_E_periodonticum:0.00831)0.97:0.1808,
((((((E_blattae_otu76:0,E_blattae_otu75:0.01744)0.82:0.00698,
(E_blattae_otu4:0.00771,E_blattae_otu39:0)0.8:0.00762)0:0,
((gi_undefined_HMPREF1540_00319_HMPREF1540_00319_E_vulneris:0,
fig_8616rna58_DXA30_07775_E_ulcerans:0)0.81:0.00724,
gi_undefined_C4N16_02505_E_albertii:0)0.92:0.01676)0.78:0.01261,
E_blattae_otu92:0.004)0.78:0.02469,(((E_coli_otu8:0.01561,
E_coli_otu38:0.00378)0:0.00378,E_coli_otu33:0)0:0,
(((E_coli_otu54:0.00713,gi_undefined_C4N19_02700_E_coli:0)
0.73:0.00675,(((E_coli_otu57:0,E_coli_otu43:0.00715)0.84:0.00715,
E_coli_otu53:0)0.79:0.00852,((((E_coli_otu40:0,
E_coli_otu56:0.0076)0:0.00376,E_coli_otu55:0.00703)0:0.00376,
E_coli_otu37:0)0:0.0028,(E_coli_otu41:0,E_coli_otu4:0.00715)
0.9:0.00714)0:0.00395)0.79:0.00862)0.77:0.00764,E_coli_otu36:0)
0.82:0.00761)0.89:0.04396)0.83:0.0832,(gi_undefined_C4N18_07110_E_blattae:0,
gi_undefined_FUSO3_01390_E_hermannii:0.04598)0.92:0.1457)0.97:0.1015);
tree.test<-read.tree(file = "file.newick")

并使用ape和phytools软件包构建树:

ggtree(tree.test) + geom_tiplab()

但是我不知道如何在关键字级别上折叠。 任何建议将不胜感激。谢谢!

1 个答案:

答案 0 :(得分:0)

一种方法是使用ape::drop.tip函数删除所有OTU,但在每个物种组中删除一个OTU:

library(ape)

## List of clades
clades <- c("vulneris","ulcerans","blattae","coli","hermannii","albertii","periodonticum","fergusonii")

## New tree placeholder
trimmed_tree <- tree.test

## Loop through each tip to drop
for(one_clade in clades) {
    ## Find the tips matching the species name
    species <- grep(one_clade, trimmed_tree$tip.label)
    ## Removing all the species but the first one
    trimmed_tree <- drop.tip(trimmed_tree, trimmed_tree$tip.label[species[-1]])
}

## Displaying the trimmed tree (with one OTU per species)
plot(trimmed_tree)