只有当行位于“ref”表的区间内时,我才需要保存“map”中的行:
按照“地图”表的示例:
map<-"chr start tag depth BCV State
chr1 1 chr1-1 1 2 1
chr1 2 chr1-2 1 3 2
chr1 3 chr1-3 1 2 3
chr1 4 chr1-4 2 2 4
chr2 5 chr2-5 2 2 5
chr2 1 chr2-1 2 2 6
chr2 2 chr2-2 3 2 4
chr2 3 chr2-3 3 2 3
chr2 4 chr2-4 3 2 2
chr2 5 chr2-5 3 2 1
chr2 6 chr2-6 3 2 7
chr2 7 chr2-7 3 2 9
chr2 8 chr2-8 2 2 2
chr2 9 chr2-9 2 2 1"
map<-read.table(text=map,header=T)
我有一个像这个例子的参考地图:
ref<-"chr start end
chr1 1 2
chr1 2 3
chr1 5 6
chr2 7 9"
ref<-read.table(text=ref,header=T)
我需要一个这样的决赛桌:
final<-"chr start tag depth BCV State
chr1 1 chr1-1 1 2 1
chr1 2 chr1-2 1 3 2
chr1 3 chr1-3 1 2 3
chr2 7 chr2-7 3 2 9
chr2 8 chr2-8 2 2 2
chr2 9 chr2-9 2 2 1"
final<-read.table(text=final,header=T)
答案 0 :(得分:4)
由于此标记为data.table
标记,因此这是一个简单的data.table::forverlaps
解决方案
setDT(map)[, end := start]
setkey(setDT(ref))
indx <- unique(foverlaps(map, ref, which = TRUE, nomatch = 0L)$xid)
map[indx]
# chr start tag depth BCV State end
# 1: chr1 1 chr1-1 1 2 1 1
# 2: chr1 2 chr1-2 1 3 2 2
# 3: chr1 3 chr1-3 1 2 3 3
# 4: chr2 7 chr2-7 3 2 9 7
# 5: chr2 8 chr2-8 2 2 2 8
# 6: chr2 9 chr2-9 2 2 1 9
这基本上会向end
添加map
列,以便关闭key
数据集ref
的间隔,以便为{{定义匹配的时间间隔1}}同时还包括foverlaps
。然后只需在删除不匹配的值时运行chr
,并选择foverlaps
重叠,以防unique
中的间隔重叠。最后根据索引对ref
进行子集化。
答案 1 :(得分:2)
首先,您需要扩展间隔:
L <- lapply(split(ref,ref$chr), function(d) unique(unlist(mapply(seq,d$start,d$end,SIMPLIFY = F))))
会给你:
#$chr1
#[1] 1 2 3 5 6
#$chr2
#[1] 7 8 9
然后你可以合并:
ref2 <- setNames(stack(L),c('start','chr'))
merge(map,ref2)
最终输出:
# chr start tag depth BCV State
#1 chr1 1 chr1-1 1 2 1
#2 chr1 2 chr1-2 1 3 2
#3 chr1 3 chr1-3 1 2 3
#4 chr2 7 chr2-7 3 2 9
#5 chr2 8 chr2-8 2 2 2
#6 chr2 9 chr2-9 2 2 1