假设我有x = rnorm(100000)
而不是做1000
长度滑动窗口移动平均线,我想做一个1000
长度滑动窗口,计算{{1}的所有时间}高于x
。
例如,
0.2
对于大x <- rnorm(1004)
start <- 1:1000
record <- list()
while(start[length(start)] <= length(x)) {
record[[length(record) + 1]] <- length(which(x[start] > 0.2))/length(start)
start <- start + 1
print(record[[length(record)]]);flush.console()
}
,这变得无法区分。什么是高效方法?
答案 0 :(得分:4)
我的贡献是计算条件累积和之间的滞后差
cumdiff = function(x) diff(c(0, cumsum( x > .2)), 20)
与
一起filt = function(x) filter(x > 0.2, rep(1, 20), sides=1)
library(TTR); ttr = function(x) runSum(x > .2, 20)
cumsub = function(x) { z <- cumsum(c(0, x>0.2)); tail(z,-20) - head(z,-20) }
执行正常
> library(microbenchmark)
> set.seed(123); xx = rnorm(100000)
> microbenchmark(cumdiff(xx), filt(xx), ttr(xx), cumsub(xx))
Unit: milliseconds
expr min lq median uq max neval
cumdiff(xx) 11.192005 12.387862 12.469253 12.77588 13.72404 100
filt(xx) 20.979503 22.058045 22.442765 23.02612 62.91730 100
ttr(xx) 8.390923 10.023934 10.119772 10.46309 11.04173 100
cumsub(xx) 7.015654 8.483432 8.538171 8.73596 9.65421 100
这些不同之处在于结果的具体表示(例如filt
和ttr
具有领先的NA),只有filter
处理嵌入的NA
> xx[22] = NA
> head(cumdiff(xx)) # NA's propagate, silently
[1] 9 9 NA NA NA NA
> ttr(xx)
Error in runSum(x > 0.2, 20) : Series contains non-leading NAs
> tail(filt(xx), -19)
[1] 9 9 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 8 8 9
...
答案 1 :(得分:3)
例如,我使用长度为100的向量并计算20个元素的窗口
x = rnorm(100)
y <- x > 0.2
z <- filter(y, rep(1,20), sides=1)
filter
命令就是这里的诀窍。 filter
获取向量y
并遍历每个元素。 rep(1,20)
告诉它一次取20个元素,乘以1,然后求它们(你可以重新(1 / 20,20)得到平均值)。 sides = 1
告诉过滤器查看元素的右侧,即考虑当前元素之前的20个元素。
> x
[1] -0.561107013 -0.687590079 -0.512261471 0.335243662 -2.483070553 -0.995055193 1.711199893 0.165077559 -0.130174687
[10] 0.671520592 -0.034204031 0.793000050 -1.407499532 0.929966998 0.464658383 1.291478397 1.122711651 -0.732289539
[19] -0.947808154 1.601688862 0.755147769 0.962415501 -0.254079608 -0.283894519 1.091660426 0.926601058 0.249405592
[28] -1.195970776 -0.384030894 -0.826048400 1.065626378 -1.935038267 1.464437755 0.675437142 0.201552948 0.367500849
[37] -0.232627068 -1.472755498 -1.894668101 -1.173382831 -0.339885920 -0.248978811 -0.378980900 1.545109833 0.280077032
[46] 0.684595064 1.867086057 0.320364730 -1.653199399 -1.365201143 -0.012793364 -0.874006845 -0.271929369 -0.119850417
[55] -0.361720731 0.361642939 0.676907910 0.816370133 0.685732967 -0.868583473 -1.830861530 1.423500571 -0.760462112
[64] 0.517177871 0.229760540 -0.187106487 -0.324125717 -0.515600949 -0.772414150 -1.144808696 0.003509157 0.138944467
[73] -0.386850873 0.268739173 0.348214296 0.431076350 -0.418408389 -1.398493830 -0.331882279 -0.203976286 -0.646023792
[82] -0.729974273 -0.744186350 0.877314843 -0.994216252 0.022162018 -1.101533562 -1.328293297 0.974986850 -0.152985355
[91] -0.714722527 0.428360138 -0.055607165 -1.015410904 0.320052726 1.951380333 2.288315801 1.295683645 -2.494534350
[100] 1.171319898
> y
[1] FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE TRUE FALSE TRUE TRUE TRUE TRUE FALSE FALSE TRUE
[21] TRUE TRUE FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE TRUE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE
[41] FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE
[61] FALSE TRUE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE
[81] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE TRUE TRUE TRUE FALSE TRUE
> z
Time Series:
Start = 1
End = 100
Frequency = 1
[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 9 10 11 11 10 11 12 12 12 12 11 12 11 12 12 12 12 11 11 11 10
[41] 9 8 8 9 9 9 9 10 10 10 9 9 8 7 6 6 7 8 9 9 9 10 10 10 10 9 8 7 7 7 7 7 7 8 9 9 8 7 6 6
[81] 6 5 5 5 4 4 4 4 5 5 5 6 6 5 5 5 6 7 7 8
答案 2 :(得分:1)
一种选择是使用rollaply:
library(zoo)
rollapply(zoo(x), width=1000, FUN=function(x) sum(x>0.2))
<强> UPD 强>:
使用TTR时,相同的解决方案更快:
library(TTR)
runSum(rnorm(n)>0.2, 1000)
一些基准测试(f1 - 动物园解决方案,f2 - 原始解决方案,f3 - TTR解决方案):
benchmark(f1(5e4), f2(5e4), f3(5e4), replications=10)
test replications elapsed relative user.self sys.self user.child sys.child
1 f1(50000) 10 29.39 326.556 29.36 0 NA NA
2 f2(50000) 10 55.57 617.444 55.50 0 NA NA
3 f3(50000) 10 0.09 1.000 0.09 0 NA NA