在向量中找到变化大于阈值的点

时间:2017-08-24 15:56:09

标签: r

我想在向量中找到位置,其中值与向量中较早的点相差超过某个阈值。应该相对于矢量中的第一个值来测量第一个变化点。应该相对于先前的变化点测量后续变化点。

我可以使用var food = {"Non-Animal":{"Plants":{"Vegetables":{}},"Minerals":{}},"Animal":{}} function add(key, value, object) { key.split('.').reduce(function(r, e, i, arr) { if(r[e] && i == arr.length - 1) Object.assign(r[e], value); return r[e] }, object) } add('Non-Animal.Plants', {'Fruits': {}}, food) console.log(food)循环执行此操作,但我想知道是否存在更惯用且更快速的矢量化灵魂。

最小例子:

for

enter image description here

3 个答案:

答案 0 :(得分:3)

Rcpp中实现相同的代码可以提高速度。

library(Rcpp)
cppFunction(
  "IntegerVector foo(NumericVector vect, double difference){
    int start = 0;
    IntegerVector changepoints;
    for (int i = 0; i < vect.size(); i++){
      if((vect[i] - vect[start]) > difference || (vect[start] - vect[i]) > difference){
        changepoints.push_back (i+1);
        start = i;        
      }
    }
    return(changepoints);
  }"
  )

foo(vect = x, difference = mindiff)
# [1]  17  25  56  98 108 144 288 297 307 312 403 470 487

identical(foo(vect = x, difference = mindiff), changepoints)
#[1] TRUE

<强>基准

#DATA
set.seed(123)
x = cumsum(rnorm(1e5))
mindiff = 5.0

library(microbenchmark)
microbenchmark(baseR = {start = x[1]
changepoints = integer()

for (i in 1:length(x)) {
    if (abs(x[i] - start) > mindiff) {
        changepoints = c(changepoints, i)
        start = x[i]
    }
}}, Rcpp = foo(vect = x, difference = mindiff))
#Unit: milliseconds
#  expr        min        lq      mean    median        uq      max neval cld
# baseR 117.194668 123.07353 125.98741 125.56882 127.78463 139.5318   100   b
#  Rcpp   7.907011  11.93539  14.47328  12.16848  12.38791 263.2796   100  a 

答案 1 :(得分:3)

这是一个仅使用baseR Reduce的解决方案。使用参数accumulate = TRUE,reduce返回每次调用函数的结果。在我们的示例中,它将使用start循环表示解决方案的for值。一旦你有了这个向量,我们只需要找到值改变的索引:

#Find the changepoints
r <- Reduce(function(a,e) {
  if (abs(e - a) > mindiff)
    e
  else 
    a
  }, x, accumulate =T)

# Get the indexes using diff
# changepoints <- head(cumsum(c(1,rle(r)$lengths)),-1)
changepoints <- which(!diff(r) == 0) + 1

修改: 我使用@Eric Watt的评论更新了答案。

答案 2 :(得分:0)

为了完整性,使用递归,我们可以得到仅使用R向量化函数的答案。 然而,这不适用于大型结果向量。例如。在OP示例中,当length(x)== 1e5

时,我们得到“嵌套太深的评估”错误
N = length(x)
f.recurs = function(x, mindiff, i=1) {
  next.i = i + which(abs(x[i:N]-x[i]) > mindiff)[1] - 1L
  if (!is.na(next.i)) c(next.i, f.recurs(x, mindiff, next.i))
  else NULL
}

f.recurs(x, 5.0)
# [1]  17  25  56  98 108 144 288 297 307 312 403 470 487