我有两个表,其中每个表都包括数字范围。一个表是另一个表的细分。我想在第一个表中创建二进制列,它显示它们重叠的范围。
例如:
df1:
start1 end1
1 6
6 8
9 12
13 15
15 19
19 20
df2:
start2 end2
2 4
9 11
14 18
结果:结果是第一个带有列的表,显示重叠是否存在。
start1 end1 overlap
1 6 1
6 8 0
9 12 1
13 15 1
15 19 1
19 20 0
感谢。
答案 0 :(得分:4)
您也可以尝试foverlaps
data.table
library(data.table)
setkey(setDT(df1), start1, end1)
setkey(setDT(df2), start2, end2)
df1[,overlap:=foverlaps(df1, df2, which=TRUE)[, !is.na(yid),]+0]
df1
# start1 end1 overlap
#1: 1 6 1
#2: 6 8 0
#3: 9 12 1
#4: 13 15 1
#5: 15 19 1
#6: 19 20 0
答案 1 :(得分:3)
library(IRanges)
ir1 = with(df1, IRanges(start1, end1))
ir2 = with(df2, IRanges(start2, end2))
df1$overlap = countOverlaps(ir1, ir2) != 0
如果这是基因组数据,那么GenomicRanges包是合适的。
答案 2 :(得分:1)
这是一种基于生成序列的方法:
nums <- unlist(apply(df2, 1, Reduce, f = seq))
df1$overlap <- as.integer(apply(df1, 1, function(x) any(seq(x[1], x[2]) %in% nums)))
# start1 end1 overlap
# 1 1 6 1
# 2 6 8 0
# 3 9 12 1
# 4 13 15 1
# 5 15 19 1
# 6 19 20 0