我有一个大数据框(超过60,000行)。我想从提取的10个行中创建一个新的数据帧,这些行的字符串与我拥有的另一个数据帧中的字符串完全匹配。如何以“ R”方式做到这一点?
大数据框(saponaria_mean_TPM_gene)的前5行:
> Saponaria_mean_TPM_gene
# A tibble: 445,547 x 7
GeneID Flower Flower_bud Old_leaf Root Stem Young_leaf
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 TRINITY_DN0_c0_g1 612. 1202. 2282. 5645. 3645. 1740.
2 TRINITY_DN1_c0_g1 11.2 10.0 63.6 56.8 18.5 26.7
3 TRINITY_DN1_c1_g1 0.0306 0.161 0.719 0.984 5.44 0.174
4 TRINITY_DN1_c2_g1 0.462 0.641 0.799 0.640 1.23 0.595
5 TRINITY_DN1_c4_g1 0.327 0.140 1.13 2.43 1.80 1.54
我要匹配的字符串(数据框coex_genes):
1 TRINITY_DN10031_c1_g1
2 TRINITY_DN10042_c0_g1
3 TRINITY_DN10042_c0_g3
4 TRINITY_DN10048_c0_g1
5 TRINITY_DN10058_c0_g1
6 TRINITY_DN10067_c5_g1
7 TRINITY_DN100732_c0_g1
8 TRINITY_DN100752_c0_g1
9 TRINITY_DN10093_c1_g5
10 TRINITY_DN100979_c0_g1
例如,TRINITY_DN10031_c1_g1
的行应为
GeneID Flower Flower_bud Old_leaf Root Stem Young_leaf
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 TRINITY_DN10031_c1_g1 1.78 2.08 0 0.226 0.544 0
我可以使用代码手动获取
gene1 <- filter(Saponaria_mean_TPM_gene, (GeneID == "TRINITY_DN10031_c1_g1"))
如何编写循环(如果明智的话)或其他方法来查找和创建coex_genes中10个基因的数据框?