使用锥形窗口计算运行平均值

时间:2018-03-14 09:41:24

标签: r mean zoo running-total

给出(虚拟)向量

index=log(seq(10,20,by=0.5))

我想计算中心窗口的运行平均值和每端有锥形窗口,即第一个条目保持不变,第二个是窗口大小的平均值3,并且依此类推,直到达到指定的窗口大小。

这里给出的答案:Calculating moving average,似乎都会产生一个较短的向量,切断窗口太大的起点和终点,例如:

ma <- function(x,n=5){filter(x,rep(1/n,n), sides=2)}

ma(index)

Time Series:
Start = 1 
End = 21 
Frequency = 1 
[1]       NA       NA 2.395822 2.440451 2.483165 2.524124 2.563466 2.601315
[9] 2.637779 2.672957 2.706937 2.739798 2.771611 2.802441 2.832347 2.861383
[17] 2.889599 2.917039 2.943746       NA       NA

同样适用于

rollmean(index,5)

来自动物园套餐

有没有一种快速的方法来实现锥形窗口而不需要编写循环编码?

3 个答案:

答案 0 :(得分:8)

由于rollapply可能很慢,因此通常需要编写一个简单的定制函数...

tapermean <- function(x, width=5){
                 taper <- pmin(width,
                               2*(seq_along(x))-1,
                               2*rev(seq_along(x))-1) #set taper pattern
                 lower <- seq_along(x)-(taper-1)/2    #lower index for each mean
                 upper <- lower+taper                 #upper index for each mean
                 x <- c(0, cumsum(x))                 #sum x once
                 return((x[upper]-x[lower])/taper)}   #return means

这比rollapply解决方案快了200倍......

library(microbenchmark)
index <- log(seq(10,200,by=0.5)) #longer version for testing
w <- c(seq(1,5,2),rep(5,length(index)-5-1),seq(5,1,-2)) #as in Scarabees answer

microbenchmark(tapermean(index),
               rollapply(index,w,mean))

Unit: microseconds
                   expr       min         lq       mean     median        uq       max neval
       tapermean(index)   185.562   193.9405   246.4123   210.6965   284.548   590.197   100
rollapply(index,w,mean) 48213.027 49681.0715 52053.7352 50583.4320 51756.378 97187.538   100

我休息一下!

答案 1 :(得分:6)

width的{​​{1}}参数可以是数字向量。

因此,在您的示例中,您可以使用:

zoo::rollapply

如果rollapply(index, c(1, 3, 5, rep(5, 15), 5, 3, 1), mean) # [1] 2.302585 2.350619 2.395822 2.440451 2.483165 2.524124 2.563466 2.601315 2.637779 2.672957 2.706937 2.739798 2.771611 2.802441 2.832347 2.861383 # [17] 2.889599 2.917039 2.943746 2.970195 2.995732 是一个奇数,那么一般的解决方案就是:

n

编辑:如果您关心性能,可以使用自定义Rcpp功能:

w <- c(seq(1, n, 2), rep(n, length(index) - n - 1), seq(n, 1, -2))
rollapply(index, w, mean)

新基准:

library(Rcpp)

cppFunction("NumericVector fasttapermean(NumericVector x, const int window = 5) {
  const int n = x.size();
  NumericVector y(n);

  double s = x[0];
  int w = 1;

  for (int i = 0; i < n; i++) {
    y[i] = s/w;
    if (i < window/2) {
      s += x[i + (w+1)/2] + x[i + (w+3)/2];
      w += 2;
    } else if (i > n - window/2 - 2) {
      s -= x[i - (w-1)/2] + x[i - (w-3)/2];
      w -= 2;
    } else {
      s += x[i + (w+1)/2] - x[i - (w-1)/2];
    }
  }

  return y;
}")

但是,如果您关心(极端)精度,则应使用n <- 5 index <- log(seq(10, 200, by = .5)) w <- c(seq(1, n, 2), rep(n, length(index) - n - 1), seq(n, 1, -2)) bench::mark( fasttapermean(index), tapermean(index), zoo::rollapply(index, w, mean) ) # # A tibble: 3 x 14 # expression min mean median max `itr/sec` mem_alloc n_gc n_itr total_time result memory time gc # <chr> <bch:tm> <bch:tm> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <bch:tm> <list> <list> <list> <list> # 1 fasttapermean(index) 4.7us 5.94us 5.56us 67.6us 168264. 5.52KB 0 10000 59.4ms <dbl [381]> <Rprofmem [2 x 3]> <bch:tm> <tibble [10,000 x 3]> # 2 tapermean(index) 53.9us 79.68us 91.08us 405.8us 12550. 37.99KB 3 5951 474.2ms <dbl [381]> <Rprofmem [16 x 3]> <bch:tm> <tibble [5,954 x 3]> # 3 zoo::rollapply(index, w, mean) 12.8ms 15.42ms 14.31ms 29.2ms 64.9 100.58KB 8 23 354.7ms <dbl [381]> <Rprofmem [44 x 3]> <bch:tm> <tibble [31 x 3]> 方法,因为R的内置rollapply算法比天真的求和除法更精确。

另请注意,mean方法是唯一允许您根据需要使用rollapply的方法。

答案 2 :(得分:4)

zoo::rollapply()类似,您也可以使用gtools::running()并更改width参数。然而,有趣的是,@ Andrew Gustar的功能仍然更快。

require(tidyverse)
require(gtools)
require(zoo)
require(rbenchmark)

index <- rep(log(seq(10,20,by=0.5)),100)

benchmark("rollapply" = {
  rollapply(index, c(1, 3, 5, rep(5, 15), 5, 3, 1), mean)
},
"tapermean" = {
  tapermean(index)
},
"running" = {
  running(index, fun=mean, width=c(1, 3, 5, rep(5, 15), 5, 3, 1),
          simplify=TRUE) 
},
replications = 1000,
columns = c("test", "replications", "elapsed","user.self",
            "sys.self"))


       test replications elapsed user.self sys.self
1 rollapply         1000   17.67     17.57     0.01
3   running         1000   32.24     32.23     0.00
2 tapermean         1000    0.14      0.14     0.00