Question

我需要在数字向量中找到大于0的值的延伸，其中每个区域中至少有10个成员。我不想检查每一个位置，因为它会非常耗费时间（矢量超过1000万）。

这是我正在尝试做的事情（非常初步，因为我无法弄清楚如何跳过for循环中的增量）：

1. Check if x[i] (start position) is positive. 
  a) if positive, check to see if x[i+10] (end position) is positive (since we want at least length 10 of positive integers)
    * if positive, check every position in between to see if positive
    * if negative, move to x[i+11], skip positions (e.g. new start position is x[i+12]) in between start & end positions since we would not get >10 members if negative end position is included. 


x <- rnorm(50, mean=0, sd=4)
for(i in 1:length(x)){
  if(x[i]>0){ # IF START POSITION IS POSITIVE
    flag=1
    print(paste0(i, ": start greater than 1"))
    if(x[i+10]>0){ # IF END POSITION POSITIVE, THEN CHECK ALL POSITIONS IN BETWEEN
      for(j in i+1:i+9){
        if(x[j]>0){ # IF POSITION IS POSITIVE, CHECK NEXT POSITION IF POSITIVE
          print(paste0(j, ": for j1")) 
        }else{ # IF POSITION IS NEGATIVE, THEN SKIP CHECKING & SET NEW START POSITION
          print(paste0(j, ": for j2"))  
          i <- i+11
          break;
        }
      }
    }else{ # IF END POSITION IS NOT POSITIVE, START CHECK ONE POSITION AFTER END POSITION
      i <- i+11
    }
  }
}

我遇到的问题是，即使我手动递增i，for循环i值也会掩盖新的设置值。欣赏任何见解。

Answer 1

我不知道这种方法是否与Curt F一样有效，但是怎么样

runs <- rle(x>0)

然后使用runs$lengths>10 & runs$values ==TRUE定义的区域？

Answer 2

这是一个解决方案，在长度为一千万的向量中找到十个正数的延伸。它不使用OP中建议的循环方法。

这里的想法是采用逻辑表达式vec>0的累积和。只有当n-10和n之间的位置处的向量的所有值都为正时，位置n和n-10之间的差异才为10。

filter是一种计算这些差异的简单且相对快捷的方法。

#generate random data
vec   <- runif(1e7,-1,1)

#cumulative sum
csvec <- cumsum(vec>0)   

#construct a filter that will find the difference between the nth value with the n-10th value of the cumulative sign vector
f11   <- c(1,rep(0,9),-1)

#apply the filter
fv    <- filter(csvec, f11, sides = 1) 

#find where the difference as computed by the filter is 10
inds  <- which(fv == 10)

#check a few results
> vec[(inds[1]-9):(inds[1])]
 [1] 0.98457526 0.03659257 0.77507743 0.69223183 0.70776891 0.34305865 0.90249491 0.93019927 0.18686722 0.69973176
> vec[(inds[2]-9):(inds[2])]
 [1] 0.0623790 0.8489058 0.3783840 0.8781701 0.6193165 0.6202030 0.3160442 0.3859175 0.8416434 0.8994019
> vec[(inds[200]-9):(inds[200])]
 [1] 0.0605163 0.7921233 0.3879834 0.6393018 0.2327136 0.3622615 0.1981222 0.8410318 0.3582605 0.6530633

#check all the results
> prod(sapply(1:length(inds),function(x){prod(sign(vec[(inds[x]-9):(inds[x])]))}))
[1] 1

我和system.time()一起玩，看看各个步骤花了多长时间。在我不那么强大的笔记本电脑上，最长的一步是filter()，对于长度为一千万的向量，这个步骤只需要半秒钟。

Answer 3

只使用基本命令的矢量化解决方案：

x <- runif(1e7,-1,1)  # generate random vector

y <- which(x<=0)  # find boundaries i.e. negatives and zeros
dif <- y[2:length(y)] - y[1:(length(y)-1)]  # find distance in boundaries
drange <- which(dif > 10)  # find distances more than 10

starts <- y[drange]+1  # starting positions of sequence
ends <- y[drange+1]-1  # last positions of sequence

您想要的第一个范围是x[starts[1]]到x[ends[1]]等等。

如何跳过R'for循环中的增量？

3 个答案: