使用rollapplyr计算滚动平均值(通过'参数错误签名)

时间:2018-01-31 21:57:54

标签: r dataframe data-manipulation

我有一个包含3列的10M行数据集:日期,变量var1和ID。我试图计算过去3天的var1滚动平均值,不包括当天。

这只是我的数据框中的一小部分内容:

    date       var1    ID
  <date>       <dbl> <int>
1 2010-01-04 -0.124 10371
2 2010-01-05 -0.162 10371
3 2011-11-25    NaN 13011
4 2016-11-10    NaN 16350
5 2016-11-11 -1.000 16350
6 2016-12-13  1.000 16350
7 2016-12-30  1.000 16517
8 2016-12-27  0.366 16524

structure(list(date = structure(c(14613, 14614, 15303, 17115, 
17116, 17148, 17165, 17162), class = "Date"), var1 = c(-0.124, 
-0.162, NaN, NaN, -1, 1, 1, 0.366), ID = c(10371L, 
10371L, 13011L, 16350L, 16350L, 16350L, 16517L, 16524L)), .Names = c("date", 
"var1", "ID"), row.names = c(NA, -8L), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), vars = "ID", drop = TRUE, indices = list(
0:1, 2L, 3:5, 6L, 7L), group_sizes = c(2L, 1L, 3L, 1L, 1L
), biggest_group_size = 3L, labels = structure(list(ID = c(10371L, 
13011L, 16350L, 16517L, 16524L)), row.names = c(NA, -5L), class = "data.frame", 
vars = "ID", drop = TRUE, .Names = "ID"))

我的代码使用dplyr和rollapplyr,如下所示:

library(dplyr)
library(zoo)

newdf = df %>% group_by(ID) %>% mutate(var1.lag1 = lag(var1, n = 1))  %>% 
mutate(avgvar1.3d = rollapplyr(data = var1.lag1,width = 3,FUN = mean,
align = "right",na.rm = T))

我希望在滚动窗口的大小(在这种情况下为3)小于组中的观察数量的情况下获得NA。但是,我正在努力应对以下错误:

Error in mutate_impl(.data, dots) : 
Evaluation error: wrong sign in 'by' argument.

任何帮助都将受到高度赞赏。

1 个答案:

答案 0 :(得分:1)

您似乎需要加入partial = T。修改完rollapplyr后,结果如下所示。

newdf = df %>% group_by(ID) %>% mutate(var1.lag1 = lag(var1, n = 1)) %>%
    mutate(avgvar1.3d = rollapplyr(data = var1.lag1,width = 3,FUN = mean, partial = TRUE,
                                 align = "right",na.rm = T))
newdf

# A tibble: 8 x 5
# Groups: ID [5]
  date           var1    ID var1.lag1 avgvar1.3d
  <date>        <dbl> <int>     <dbl>      <dbl>
1 2010-01-04 -  0.124 10371    NA        NaN    
2 2010-01-05 -  0.162 10371  -  0.124   -  0.124
3 2011-11-25  NaN     13011    NA        NaN    
4 2016-11-10  NaN     16350    NA        NaN    
5 2016-11-11 -  1.00  16350   NaN        NaN    
6 2016-12-13    1.00  16350  -  1.00    -  1.00 
7 2016-12-30    1.00  16517    NA        NaN    
8 2016-12-27    0.366 16524    NA        NaN