通过滑动窗口计算事件的快速方法

时间:2013-12-23 06:57:50

标签: r performance loops vector count

假设我有x = rnorm(100000)而不是做1000长度滑动窗口移动平均线,我想做一个1000长度滑动窗口,计算{{1}的所有时间}高于x

例如,

0.2

对于大x <- rnorm(1004) start <- 1:1000 record <- list() while(start[length(start)] <= length(x)) { record[[length(record) + 1]] <- length(which(x[start] > 0.2))/length(start) start <- start + 1 print(record[[length(record)]]);flush.console() } ,这变得无法区分。什么是高效方法?

3 个答案:

答案 0 :(得分:4)

我的贡献是计算条件累积和之间的滞后差

cumdiff = function(x) diff(c(0, cumsum( x > .2)), 20)

一起
filt = function(x) filter(x > 0.2, rep(1, 20), sides=1)
library(TTR); ttr = function(x) runSum(x > .2, 20)
cumsub = function(x) { z <- cumsum(c(0, x>0.2)); tail(z,-20) - head(z,-20) }

执行正常

> library(microbenchmark)
> set.seed(123); xx = rnorm(100000)
> microbenchmark(cumdiff(xx), filt(xx), ttr(xx), cumsub(xx))
Unit: milliseconds
        expr       min        lq    median       uq      max neval
 cumdiff(xx) 11.192005 12.387862 12.469253 12.77588 13.72404   100
    filt(xx) 20.979503 22.058045 22.442765 23.02612 62.91730   100
     ttr(xx)  8.390923 10.023934 10.119772 10.46309 11.04173   100
  cumsub(xx)  7.015654  8.483432  8.538171  8.73596  9.65421   100

这些不同之处在于结果的具体表示(例如filtttr具有领先的NA),只有filter处理嵌入的NA

> xx[22] = NA
> head(cumdiff(xx))  # NA's propagate, silently
[1]  9  9 NA NA NA NA
> ttr(xx)
Error in runSum(x > 0.2, 20) : Series contains non-leading NAs
> tail(filt(xx), -19)
 [1]  9  9 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA  8  8  9
 ...

答案 1 :(得分:3)

例如,我使用长度为100的向量并计算20个元素的窗口

x = rnorm(100)
y <- x > 0.2
z <- filter(y, rep(1,20), sides=1)

filter命令就是这里的诀窍。 filter获取向量y并遍历每个元素。 rep(1,20)告诉它一次取20个元素,乘以1,然后求它们(你可以重新(1 / 20,20)得到平均值)。 sides = 1告诉过滤器查看元素的右侧,即考虑当前元素之前的20个元素。

> x
  [1] -0.561107013 -0.687590079 -0.512261471  0.335243662 -2.483070553 -0.995055193  1.711199893  0.165077559 -0.130174687
 [10]  0.671520592 -0.034204031  0.793000050 -1.407499532  0.929966998  0.464658383  1.291478397  1.122711651 -0.732289539
 [19] -0.947808154  1.601688862  0.755147769  0.962415501 -0.254079608 -0.283894519  1.091660426  0.926601058  0.249405592
 [28] -1.195970776 -0.384030894 -0.826048400  1.065626378 -1.935038267  1.464437755  0.675437142  0.201552948  0.367500849
 [37] -0.232627068 -1.472755498 -1.894668101 -1.173382831 -0.339885920 -0.248978811 -0.378980900  1.545109833  0.280077032
 [46]  0.684595064  1.867086057  0.320364730 -1.653199399 -1.365201143 -0.012793364 -0.874006845 -0.271929369 -0.119850417
 [55] -0.361720731  0.361642939  0.676907910  0.816370133  0.685732967 -0.868583473 -1.830861530  1.423500571 -0.760462112
 [64]  0.517177871  0.229760540 -0.187106487 -0.324125717 -0.515600949 -0.772414150 -1.144808696  0.003509157  0.138944467
 [73] -0.386850873  0.268739173  0.348214296  0.431076350 -0.418408389 -1.398493830 -0.331882279 -0.203976286 -0.646023792
 [82] -0.729974273 -0.744186350  0.877314843 -0.994216252  0.022162018 -1.101533562 -1.328293297  0.974986850 -0.152985355
 [91] -0.714722527  0.428360138 -0.055607165 -1.015410904  0.320052726  1.951380333  2.288315801  1.295683645 -2.494534350
[100]  1.171319898
> y
  [1] FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE FALSE FALSE  TRUE
 [21]  TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE
 [41] FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE FALSE
 [61] FALSE  TRUE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE
 [81] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE
> z
Time Series:
Start = 1 
End = 100 
Frequency = 1 
  [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA  9 10 11 11 10 11 12 12 12 12 11 12 11 12 12 12 12 11 11 11 10
 [41]  9  8  8  9  9  9  9 10 10 10  9  9  8  7  6  6  7  8  9  9  9 10 10 10 10  9  8  7  7  7  7  7  7  8  9  9  8  7  6  6
 [81]  6  5  5  5  4  4  4  4  5  5  5  6  6  5  5  5  6  7  7  8

答案 2 :(得分:1)

一种选择是使用rollaply:

library(zoo)
rollapply(zoo(x), width=1000, FUN=function(x) sum(x>0.2))

<强> UPD

使用TTR时,相同的解决方案更快:

library(TTR)
runSum(rnorm(n)>0.2, 1000)

一些基准测试(f1 - 动物园解决方案,f2 - 原始解决方案,f3 - TTR解决方案):

benchmark(f1(5e4), f2(5e4), f3(5e4), replications=10)
       test replications elapsed relative user.self sys.self user.child sys.child
1 f1(50000)           10   29.39  326.556     29.36        0         NA        NA
2 f2(50000)           10   55.57  617.444     55.50        0         NA        NA
3 f3(50000)           10    0.09    1.000      0.09        0         NA        NA