Question

如何使用下面的文本文件

  gene_id                 homolog_gene_oid  homolog_taxon_oid percent_identity
1 Ga0197852_1000011       2656190422        2654587899            64.10
2 Ga0197852_1000012       2656190421        2654587899            91.96
3 Ga0197852_1000013       2656190420        2654587899            89.48

lineage
Hydrogenimonas thermophila_1
Hydrogenimonas thermophila_1
Hydrogenimonas thermophila_2

创建每个谱系（上面文件中的最后一列）的基因（上面文件中的第一列）的相对丰度表（第二列将总和为1），如下所示？

lineage                      rel_abund
Hydrogenimonas thermophila_1 0.66
Hydrogenimonas thermophila_2 0.33

Answer 1

听起来你只想要一个因子/字符向量的比例，所以你可以将table()包裹在prop.table()中：

mydat <- read.table(text="gene_id       lineage homolog_gene_oid        homolog_taxon_oid       percent_identity
Ga0197852_1000011       thermophila_1   2656190422      2654587899      64.1
Ga0197852_1000012       thermophila_1   2656190421      2654587899      91.96
Ga0197852_1000013       thermophila_2   2656190420      2654587899      89.48
           ", header=T)

prop.table(table(mydat$lineage))

thermophila_1 thermophila_2 
    0.6666667     0.3333333

作为data.frame：

 as.data.frame(prop.table(table(mydat$lineage)))

           Var1      Freq
1 thermophila_1 0.6666667
2 thermophila_2 0.3333333

当然，您可以使用names()或colnames()为列添加任意名称。

rel_abundance <- as.data.frame(prop.table(table(mydat$lineage)))
names(rel_abundance) <- c("Lineage", "Rel. Abundance")
rel_abundance

        Lineage     Rel. Abundance
1 thermophila_1      0.6666667
2 thermophila_2      0.3333333

请注意，我没有说出＆＃34; Hydrogenimonas＆＃34;在lineage只是因为我没有电子表格应用程序来解析你的数据来帮助我（因此我们鼓励使用dput()，内置数据集或在其中创建的数据集你问题的代码。）

来自IMG phylodist文件的文本字符串的相对丰度

1 个答案: