我有两个数据集(导入为数据帧)。第一个数据帧是沿着该染色体的染色体和感兴趣的位置列表(Number,Qual和dt只是其他列)。数据框称为sam
Number Qual chr leftPos dt
3 0 chr1 4105086 255
4 16 chr1 4464364 255
5 16 chr1 4464390 255
6 16 chr1 9655049 255
7 16 chr1 9945004 255
etc
第二个数据集(称为计数)包含我感兴趣的染色体和染色体位置:
Chr Locus
chr1 4105086
chr1 4464364
我想删除sam中没有Chr和Locus相应组合的所有行。
输出应如下所示:
Number Qual chr leftPos dt
3 0 chr1 4105086 255
4 16 chr1 4464364 255
我不想合并,因为我不想在原始数据集(sam)中添加额外的列等我只想根据第一个数据集排除行。
答案 0 :(得分:2)
看看这是否是你要找的东西
# sample data
sam = structure(list(Number = 3:7, Qual = c(0L, 16L, 16L, 16L, 16L),
chr = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "chr1", class = "factor"),
leftPos = c(4105086L, 4464364L, 4464390L, 9655049L, 9945004L
), dt = c(255L, 255L, 255L, 255L, 255L)), .Names = c("Number",
"Qual", "chr", "leftPos", "dt"), class = "data.frame", row.names = c(NA,
-5L))
counts = structure(list(Chr = structure(c(1L, 1L), .Label = "chr1", class = "factor"),
Locus = c(4105086L, 4464364L)), .Names = c("Chr", "Locus"
), class = "data.frame", row.names = c(NA, -2L))
library(dplyr)
new_data = sam %>% filter(paste0(chr,"_",leftPos) %in%
with(counts, paste0(Chr,"_",Locus)))
new_data
# Number Qual chr leftPos dt
# 1 3 0 chr1 4105086 255
# 2 4 16 chr1 4464364 255
或按建议使用合并
new_data = merge(sam, counts, by.x=c("chr","leftPos"), by.y=c("Chr","Locus"))
new_data = new_data[,c(3,4,1,2,5)]
# Number Qual chr leftPos dt
# 1 3 0 chr1 4105086 255
# 2 4 16 chr1 4464364 255