删除列中的值与另一个数据集中的列值不匹配的所有行

时间:2014-12-06 10:03:52

标签: r

我有两个数据集(导入为数据帧)。第一个数据帧是沿着该染色体的染色体和感兴趣的位置列表(Number,Qual和dt只是其他列)。数据框称为sam

  Number   Qual  chr     leftPos     dt
   3        0   chr1    4105086     255
   4       16   chr1    4464364     255
   5       16   chr1    4464390     255
   6       16   chr1    9655049     255
   7       16   chr1    9945004     255
   etc

第二个数据集(称为计数)包含我感兴趣的染色体和染色体位置:

    Chr     Locus
   chr1    4105086
   chr1    4464364

我想删除sam中没有Chr和Locus相应组合的所有行。

输出应如下所示:

Number   Qual  chr     leftPos     dt
3         0   chr1     4105086      255
4        16   chr1     4464364      255

我不想合并,因为我不想在原始数据集(sam)中添加额外的列等我只想根据第一个数据集排除行。

1 个答案:

答案 0 :(得分:2)

看看这是否是你要找的东西

# sample data
sam = structure(list(Number = 3:7, Qual = c(0L, 16L, 16L, 16L, 16L), 
    chr = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "chr1", class = "factor"), 
    leftPos = c(4105086L, 4464364L, 4464390L, 9655049L, 9945004L
    ), dt = c(255L, 255L, 255L, 255L, 255L)), .Names = c("Number", 
"Qual", "chr", "leftPos", "dt"), class = "data.frame", row.names = c(NA, 
-5L))

counts = structure(list(Chr = structure(c(1L, 1L), .Label = "chr1", class = "factor"), 
    Locus = c(4105086L, 4464364L)), .Names = c("Chr", "Locus"
), class = "data.frame", row.names = c(NA, -2L))

library(dplyr)
new_data = sam %>% filter(paste0(chr,"_",leftPos) %in%
                            with(counts, paste0(Chr,"_",Locus)))
new_data
#   Number Qual  chr leftPos  dt
# 1      3    0 chr1 4105086 255
# 2      4   16 chr1 4464364 255

或按建议使用合并

new_data = merge(sam, counts, by.x=c("chr","leftPos"), by.y=c("Chr","Locus"))
new_data = new_data[,c(3,4,1,2,5)]
#   Number Qual  chr leftPos  dt
# 1      3    0 chr1 4105086 255
# 2      4   16 chr1 4464364 255