使用下一列填充NAs以获得移动平均值

时间:2018-05-31 20:08:07

标签: r dplyr fill zoo moving-average

set.seed(123)
df <- data.frame(loc.id = rep(c(1:3), each = 4*10), 
                       year = rep(rep(c(1980:1983), each = 10), times = 3),
                       day = rep(1:10, times = 3*4),
                       x = sample(123:200, 4*3*10, replace = T))

我想再添加一列x.mv,每个loc.id和年份组合的x移动平均值为3天

df %>% group_by(loc.id,year) %>% mutate(x.mv = zoo::rollmean(x, 3, fill = "NA", align = "right"))

          loc.id  year   day     x  x.mv
          <int> <int> <int> <int> <dbl>
      1      1   1980     1   145  NA 
      2      1   1980     2   184  NA 
      3      1   1980     3   154  161 
      4      1   1980     4   191  176.
      5      1   1980     5   196  180.
      6      1   1980     6   126  171 
      7      1   1980     7   164  162 
      8      1   1980     8   192  161.
      9      1   1980     9   166  174 
      10      1  1980    10   158  172 

我想要做的是用x.mv替换x列中的NA。我试过这个:

df %>% group_by(loc.id,year) %>% mutate(x.mv = zoo::rollmean(x, 3, fill = x[1:2], align = "right"))

            loc.id  year   day     x  x.mv
            <int> <int> <int> <int> <dbl>
        1      1   1980     1   145  145 
        2      1   1980     2   184  145 
        3      1   1980     3   154  161 
        4      1   1980     4   191  176.
        5      1   1980     5   196  180.
        6      1   1980     6   126  171 
        7      1   1980     7   164  162 
        8      1   1980     8   192  161.
        9      1   1980     9   166  174 
        10     1  1980     10   158  172 

但它正在做的是用第一个x值而不是相应的x值填充NA。我该如何解决?

1 个答案:

答案 0 :(得分:2)

跳过fill参数并手动填充:

df %>%
  group_by(loc.id,year) %>%
  mutate(x.mv = c(x[1:2],zoo::rollmean(x, 3, align = "right"))) %>%
  ungroup

# # A tibble: 120 x 5
#   loc.id  year   day     x     x.mv
#    <int> <int> <int> <int>    <dbl>
# 1      1  1980     1   145 145.0000
# 2      1  1980     2   184 184.0000
# 3      1  1980     3   154 161.0000
# 4      1  1980     4   191 176.3333
# 5      1  1980     5   196 180.3333
# 6      1  1980     6   126 171.0000
# 7      1  1980     7   164 162.0000
# 8      1  1980     8   192 160.6667
# 9      1  1980     9   166 174.0000
# 10     1  1980    10   158 172.0000
# # ... with 110 more rows

您可能希望使用dplyr::cummean(x[1:2])代替x[1:2]来获得第二个值的平均值,或者在这种情况下,在评论中使用@ g-grothendieck的建议并重写您的mutate调用为mutate(x.mv = rollapplyr(x, 3, mean, partial = TRUE))