比较和发现R中的重叠范围

时间:2014-11-13 09:39:47

标签: r overlap

我有两个表,其中每个表都包括数字范围。一个表是另一个表的细分。我想在第一个表中创建二进制列,它显示它们重叠的范围。

例如:

df1:
start1   end1
 1       6
 6       8
 9       12
 13      15
 15      19
 19      20

df2:
start2   end2
 2        4
 9        11
 14       18

结果:结果是第一个带有列的表,显示重叠是否存在。

  start1   end1   overlap
     1       6       1
     6       8       0
     9       12      1
     13      15      1
     15      19      1
     19      20      0

感谢。

3 个答案:

答案 0 :(得分:4)

您也可以尝试foverlaps

中的data.table
library(data.table)
setkey(setDT(df1), start1, end1)
setkey(setDT(df2), start2, end2)
df1[,overlap:=foverlaps(df1, df2, which=TRUE)[, !is.na(yid),]+0]
df1
#   start1 end1 overlap
#1:      1    6       1
#2:      6    8       0
#3:      9   12       1
#4:     13   15       1
#5:     15   19       1
#6:     19   20       0

答案 1 :(得分:3)

IRanges

library(IRanges)
ir1 = with(df1, IRanges(start1, end1))
ir2 = with(df2, IRanges(start2, end2))
df1$overlap = countOverlaps(ir1, ir2) != 0

如果这是基因组数据,那么GenomicRanges包是合适的。

答案 2 :(得分:1)

这是一种基于生成序列的方法:

nums <- unlist(apply(df2, 1, Reduce, f = seq))

df1$overlap <- as.integer(apply(df1, 1, function(x) any(seq(x[1], x[2]) %in% nums)))
#   start1 end1 overlap
# 1      1    6       1
# 2      6    8       0
# 3      9   12       1
# 4     13   15       1
# 5     15   19       1
# 6     19   20       0