在间隔可能重叠的情况下找到矢量中值的间隔索引

时间:2017-05-21 04:02:03

标签: r find intervals

我想在属于间隔的向量中找到值的索引,这些间隔由结束值的向量和1)“回顾”值区间和2)前N个值定义。

假设我有

x <- c(1,3,4,5,7,8,9,10,13,14,15,16,17,18) #the vector of interest
v_end <- c(5, 7, 15) #the end values
l<-3 #look-back value interval
N<-3 #number of value to look back

我想要的是以下输出的第二和第三列。

       x i n
 [1,]  1 0 1
 [2,]  3 1 1
 [3,]  4 1 1
 [4,]  5 1 1
 [5,]  7 1 1
 [6,]  8 0 0
 [7,]  9 0 0
 [8,] 10 0 1
 [9,] 13 1 1
[10,] 14 1 1
[11,] 15 1 1
[12,] 16 0 0
[13,] 17 0 0
[14,] 18 0 0

请注意,v_end和l导致三个区间[2,5],[4,7],[12,15]。 [2,5]和[4,7]有重叠,基本上是[2,7]。 并且,v_end和l导致三个区间[1,5],[3,7],[10,15]。再次有重叠。

该任务类似于函数findInterval {base},但无法通过它解决。

1 个答案:

答案 0 :(得分:1)

订购&#34; v_end&#34;和&#34; x&#34; (对于&#34; N&#34;情况),&#34; l&#34;的间隔;案例是:

ints = cbind(start = v_end - l, end = v_end)
ints
#     start end
#[1,]     2   5
#[2,]     4   7
#[3,]    12  15

他们的重叠可以与:

分组
overlap_groups = cumsum(c(TRUE, ints[-nrow(ints), "end"] < ints[-1, "start"]))

可用于减少重叠的间隔:

group_end = cumsum(rle(overlap_groups)$lengths)
group_start = c(1L, group_end [-length(group_end )] + 1L)

ints2 = cbind(start = ints[group_start, "start"], end = ints[group_end, "end"])
ints2
#     start end
#[1,]     2   7
#[2,]    12  15

然后使用findInterval

istart = findInterval(x, ints2[, "start"])
iend = findInterval(x, ints2[, "end"], left.open = TRUE)

i = as.integer((istart - iend) == 1L)
i
# [1] 0 1 1 1 1 0 0 0 1 1 1 0 0 0

对于&#34; N&#34;的情况,从:

开始
ints = cbind(start = x[match(v_end, x) - N], end = v_end)
ints
#     start end
#[1,]     1   5
#[2,]     3   7
#[3,]    10  15

按照上述步骤,我们得到:

#.....
n = as.integer((istart - iend) == 1L)
n
# [1] 1 1 1 1 1 0 0 1 1 1 1 0 0 0

通常,这种操作的便利工具是&#34; IRanges&#34;这里的方法简单明了:

library(IRanges)

xrng = IRanges(x, x)
i = as.integer(overlapsAny(xrng, reduce(IRanges(v_end - l, v_end), min.gapwidth = 0)))
i
# [1] 0 1 1 1 1 0 0 0 1 1 1 0 0 0
n = as.integer(overlapsAny(xrng, reduce(IRanges(x[match(v_end, x) - N], v_end), min.gapwidth = 0)))
n
# [1] 1 1 1 1 1 0 0 1 1 1 1 0 0 0