使用dplyr和RcppRoll计算所有固定窗口平均值

时间:2018-04-18 21:49:05

标签: r dplyr

我想使用dplyr和RcppRoll计算所有(或至少很多)固定窗口平均值。例如,如果我想根据之前4,5和6个步骤的storms数据计算平均风速,我可以使用以下内容:

library(dplyr)
library(RcppRoll)

set.seed(1)
storms <- storms[storms$name %in% sample(storms$name, size = 4),]

storms %>%
  select(name, year, month, day, hour, wind) %>%
  group_by(name) %>%
  arrange(name, year, month, day, hour) %>%
  mutate_at("wind", .funs = funs(
    "avg_4" = roll_meanr(., n = 4, fill = NA),
    "avg_5" = roll_meanr(., n = 5, fill = NA),
    "avg_6" = roll_meanr(., n = 6, fill = NA)
  ))

这是有效的,但如果我想计算2到20的窗口的所有固定窗口平均值,我就会厌倦复制和粘贴funs()内的行。

似乎我应该能够以某种方式对此进行参数化,但我还没有弄清楚如何。

2 个答案:

答案 0 :(得分:2)

使用Base R,我希望它有所帮助:

storms_wind <- storms %>%
    select(name, year, month, day, hour, wind) %>%
    group_by(name) %>%
    arrange(name, year, month, day, hour)

multi_avg <- function(df, start, end) {
                 for(i in (strat:end)){
                 varname <- paste("avg", i , sep="_")
                 df[[varname]] <- with(df, roll_meanr(wind, n = i, fill = NA))
                }
             df
           }


multi_avg(df=storms_wind, start=4,end=20) 

答案 1 :(得分:2)

只需使用引用和取消引用的功能! 这就是你拥有的:

library(dplyr)
library(RcppRoll)

set.seed(1)
storms <- storms[storms$name %in% sample(storms$name, size = 4),]

storms_subset <- storms %>%
  select(name, year, month, day, hour, wind) %>%
  group_by(name) %>%
  arrange(name, year, month, day, hour) %>%
  mutate_at("wind", .funs = funs(
    "avg_4" = roll_meanr(., n = 4, fill = NA),
    "avg_5" = roll_meanr(., n = 5, fill = NA),
    "avg_6" = roll_meanr(., n = 6, fill = NA)
  ))

现在让我们为不同的roll_meanr(x, n)x构建一系列像n这样的表达式的函数。

make_rollmeans <- function(..., .n = 3) {
  # this line captures vars you typed in
  .dots <- rlang::exprs(...)

  # now you iterate over captured variables...
  q <- purrr::map(.dots, function(.var) {
    # ... and over window sizes
    purrr::map(.n, function(.nn) {
      # for each (variable, window) pair make an expression
      rlang::expr(RcppRoll::roll_meanr(!!.var, !!(.nn)))
    }) %>% 
      # set proper names by combining variable name, "avg", and window size
      purrr::set_names(paste0(as.character(.var), "_avg_", .n))
  }) %>%
    # and finally remove inner structure of list of expressions
    # after that you'll have a list of expressions with depth 1 
    purrr::flatten() 
  q
}

所有的魔力来自rlang::expr(RcppRoll::roll_meanr(!!.var, !!(.nn)))。 使用!!.var,您可以用.var替换输入变量名称,即wind。 使用!!.nn,您可以使用数字替换.nn。 接下来,使用rlang::expr(...)引用表达式。

此函数获取没有""的变量名称和窗口大小的向量。输出如下:

make_rollmeans(wind, pressure, .n = c(3, 5))
#> $wind_avg_3
#> RcppRoll::roll_meanr(wind, 3)
#> 
#> $wind_avg_5
#> RcppRoll::roll_meanr(wind, 5)
#> 
#> $pressure_avg_3
#> RcppRoll::roll_meanr(pressure, 3)
#> 
#> $pressure_avg_5
#> RcppRoll::roll_meanr(pressure, 5)

您可以看到您正在寻找的表达方式。

接下来,您可以使用make_rollmeans(bang-bang-bang)运算符将mutate()置于!!!调用内,以取消对由其构建的表达式的排序。

select(storms_subset, wind) %>% mutate(!!!make_rollmeans(wind, .n = 3:20))
#> Adding missing grouping variables: `name`
#> # A tibble: 261 x 20
#> # Groups:   name [4]
#>    name     wind wind_avg_3 wind_avg_4 wind_avg_5 wind_avg_6 wind_avg_7
#>    <chr>   <int>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
#>  1 Ernesto    30       NA         NA          NA        NA         NA  
#>  2 Ernesto    30       NA         NA          NA        NA         NA  
#>  3 Ernesto    30       30.0       NA          NA        NA         NA  
#>  4 Ernesto    35       31.7       31.2        NA        NA         NA  
#>  5 Ernesto    40       35.0       33.8        33.       NA         NA  
#>  6 Ernesto    50       41.7       38.8        37.       35.8       NA  
#>  7 Ernesto    60       50.0       46.2        43.       40.8       39.3
#>  8 Ernesto    55       55.0       51.2        48.       45.0       42.9
#>  9 Ernesto    50       55.0       53.8        51.       48.3       45.7
#> 10 Ernesto    45       50.0       52.5        52.       50.0       47.9
#> # ... with 251 more rows, and 13 more variables: wind_avg_8 <dbl>,
#> #   wind_avg_9 <dbl>, wind_avg_10 <dbl>, wind_avg_11 <dbl>,
#> #   wind_avg_12 <dbl>, wind_avg_13 <dbl>, wind_avg_14 <dbl>,
#> #   wind_avg_15 <dbl>, wind_avg_16 <dbl>, wind_avg_17 <dbl>,
#> #   wind_avg_18 <dbl>, wind_avg_19 <dbl>, wind_avg_20 <dbl>

我希望结果与您的要求相同。 :)