从满足给定条件的数值向量中找出长度为k的连续子向量

时间:2014-08-23 16:14:58

标签: r

我在R中有一个数字向量,比如说

v= c(2,3,5,6,7,6,3,2,3,4,5,7,8,9,6,1,1,2,5,6,7,11,2,3,4)

现在,我必须找到所有大小为4的连续子向量,其条件是子向量的每个元素必须大于2,并且所有子向量必须在非意义上不相交两个子矢量可以包含相同的索引元素。所以我的输出将是:

(3,5,6,7),(3,4,5,7),(5,6,7,11)

编辑: 用于说明目的的其他示例:for,

v=c(3,3,3,3,1,3,3,3,3,3,3,3,3) 

输出将是:

(3,3,3,3), (3,3,3,3),(3,3,3,3).

和for,

v= c(2,3,5,5,7,6,3,2,3,4,5,7,8,9,6,1,1,2,5,6,7,11,2,3,4) 

输出

(3,5,5,7),(3,4,5,7),(5,6,7,11)

输出的第二个条件简单地说,如果我们发现任何子阵列说(v[m],v[m+1],v[m+2],v[m+3]),每个元素大于> 2然后它将进入我的输出,下一个子数组只能从v[m+4]开始(如果可能)

3 个答案:

答案 0 :(得分:5)

此解决方案使用embed()创建滞后矩阵,然后从此矩阵中提取所需的行:

v <- c(2,3,5,6,7,6,3,2,3,4,5,7,8,9,6,1,1,2,5,6,7,11,2,3,4)

e <- embed(v, 4)
ret <- which(
  apply(e, 1, function(x)all(x > 2)) &
  apply(e, 1, function(x)length(unique(x)) == 4)
)
rows <- ret[c(1, 1 + which(diff(ret) > 4))]

e[rows, 4:1]

     [,1] [,2] [,3] [,4]
[1,]    3    5    6    7
[2,]    3    4    5    7
[3,]    5    6    7   11

答案 1 :(得分:1)

尝试:

  fun1 <- function(vec, n, cond1) {
  lst1 <- lapply(1:(length(vec) - n+1), function(i) {
    x1 <- vec[i:(i + (n-1))]
    if (all(diff(x1) >= 0) & all(x1 > cond1)) 
        x1
   })
   indx <- which(sapply(lst1, length) == n)
  indx2 <- unlist(lapply(split(indx, cumsum(c(TRUE, diff(indx) != 1))), function(x) x[seq(1, 
    length(x), by = n-1)]))
   lst1[indx2]
}


v1 <- c(3,3,3,3,1,3,3,3,3,3,3,3,3)
v2 <- c(2,3,5,5,7,6,3,2,3,4,5,7,8,9,6,1,1,2,5,6,7,11,2,3,4)
v3 <- c(2,3,5,6,7,6,3,2,3,4,5,7,8,9,6,1,1,2,5,6,7,11,2,3,4)

fun1(v1,4,2)
#[[1]]
#[1] 3 3 3 3

#[[2]]
#[1] 3 3 3 3

#[[3]]
#[1] 3 3 3 3

 fun1(v2,4,2)
 #[[1]]
 #[1] 3 5 5 7

#[[2]]
#[1] 3 4 5 7

#[[3]]
#[1]  5  6  7 11

fun1(v3,4,2)
#[[1]]
#[1] 3 5 6 7

#[[2]]
#[1] 3 4 5 7

#[[3]]
#[1]  5  6  7 11

答案 2 :(得分:0)

以下是基于rle的另一个想法:

ff = function(x, size, thres)
{
   valid_subsets = sapply(head(seq_along(x), -(size - 1)), 
                          function(i) all(x[i:(i + (size - 1))] > thres))
   r = rle(valid_subsets)

   lapply(unlist(mapply(function(a, b) a + (seq_len(b) - 1) * size, 
                        (cumsum(r$lengths) - r$lengths + 1)[which(r$values)], 
                        (r$lengths[which(r$values)] + size - 1) %/% size)), 
          function(i) x[i:(i + (size - 1))])
}

ff(c(3,3,3,3,1,3,3,3,3,3,3,3,3), 4, 2)
ff(c(2,3,5,6,7,6,3,2,3,4,5,7,8,9,6,1,1,2,5,6,7,11,2,3,4), 4, 2)

测试另一个向量(我假设这是正确的输出):

set.seed(4); xx = sample(1:10, 20, T)
xx
# [1]  6  1  3  3  9  3  8 10 10  1  8  3  2 10  5  5 10  6 10  8
ff(xx, 4, 2)
#[[1]]
#[1] 3 3 9 3
#
#[[2]]
#[1] 10  5  5 10

除非我错过了某些内容,否则,&#34; xx&#34; (以及其他情况)其他发布的答案似乎不起作用:

fun1(xx, 4, 2)
#[[1]]
#[1]  3  8 10 10

#e[rows, 4:1]
#[1]  9  3  8 10