在基于其他数据帧的两列对数据帧进行子集设置时出现问题?

时间:2019-03-23 17:52:33

标签: r subset

我有第一个更大的数据框,称为All_position:

  Sample  CHR   POS REF STR COV ALT_RNA FQ_RNA COV_DNA ALT_DNA FQ_DNA
1 S10726833 chr1 100611200   A   2   8       -      0      33       -   0.00
2 S10702423 chr1 100611200   A   2  82       -      0      53       -   0.00
3 S10342383 chr1 100611200   C   2  83       -      0      45       -   0.00
4 S07016265 chr7 100611200   A   2  23       -      0     131       -   0.00
5 S54340133 chr7 100611200   T   2   2       -      0       4       -   0.00
6 S00950133 chr1 768590      A   2   1       -      0       5       -   0.00
7 S00950433 chr1 76859450    A   2   1       -      0       5       -   0.00
8 S43243244 chr1 14493445    A   2   8       -      0      33       -   0.00
9 SRR107013 chr1 1744707     A   2  82       -      0      53       -   0.00
10 S54340133 chr1 26614716   C   2  83       -      0      45       -   0.00
11 S43242347 chr1 15165451   A   2  23       -      0     131       -   0.00
12 S74637398 chr6 43439788   T   2   2       -      0       4       -   0.00
13 S47894846 chr1 15976522   A   2   1       -      0       5       -   0.00

和第二个数据帧称为“计数”:

     CHR    POS  ALT_RNA ALT_DNA freq
15  chr1 100611200      AG       -   49
20  chr5 44892324       AG       -   58
220 chr4 363432   TC TA TG       -   59
223 chr3 8934434        AG       -   53
259 chr1 768590         AT       -   65

我需要基于Count数据帧中的“ CHR”和“ POS”列过滤“ All_position”数据帧,以获得如下结果:

  Sample  CHR   POS REF STR COV ALT_RNA FQ_RNA COV_DNA ALT_DNA FQ_DNA
1 S10726833 chr1 100611200   A   2   8       -      0      33       -   0.00
2 S10702423 chr1 100611200   A   2  82       -      0      53       -   0.00
3 S10342383 chr1 100611200   C   2  83       -      0      45       -   
6 S00950133 chr1 768590      A   2   1       -      0       5       -   0.00

我刚刚尝试的是基于此站点上找到的代码的不同方法。

All_pos_subset <- All_position[which((All_position$CHR %in% Count$CHR) & (All_position$POS %in% Count$POS)), ]

All_pos_subset <- All_position[c((All_position$CHR %in% Count$CHR) & (All_position$POS %in% Count$POS)), ]

All_pos_subset <- All_position %>% filter((CHR %in% Count$CHR & POS %in% Count$POS))

All_pos_subset <- subset(All_position, ((All_position$CHR %in% Count$CHR) & (All_position$POS %in% Count$POS)) )

但是我得到的是这样的:

  Sample  CHR   POS REF STR COV ALT_RNA FQ_RNA COV_DNA ALT_DNA FQ_DNA
1 S10726833 chr1 100611200   A   2   8       -      0      33       -   0.00
2 S10702423 chr1 100611200   A   2  82       -      0      53       -   0.00
3 S10342383 chr1 100611200   C   2  83       -      0      45       -   0.00
4 S07016265 chr7 100611200   A   2  23       -      0     131       -   0.00
5 S54340133 chr7 100611200   T   2   2       -      0       4       -   0.00
6 S00950133 chr1 768590      A   2   1       -      0       5       -   0.00

为什么我的结果中有chr7 100611200位置? 我的代码有什么问题?

谢谢。

0 个答案:

没有答案