按组计算滚动总和

时间:2020-08-30 22:22:36

标签: r dataframe

我想计算3个先前值的滚动总和(或自定义函数),分别对待每个组。我已经尝试过了:

require(dplyr)

# Build dataframe
df <- data.frame(person = c(rep("Peter", 5), rep("James", 5)),
                 score1 = c(1,3,2,5,4,6,8,4,5,3),
                 score2 = c(1,1,1,5,1,3,4,8,9,0))

# Attempt rolling sum by group
df %>% 
  group_by(person) %>% 
  mutate(s1_rolling = rollsumr(score1, k = 3, fill = NA),
         s2_rolling = rollsumr(score2, k = 3, fill = NA))

但是新列不会将每个组分开对待,而是继续处理整个数据集:

   person score1 score2 s1_rolling s2_rolling
   <chr>   <dbl>  <dbl>      <dbl>      <dbl>
 1 Peter       1      1         NA         NA
 2 Peter       3      1         NA         NA
 3 Peter       2      1          6          3
 4 Peter       5      5         10          7
 5 Peter       4      1         11          7
 6 James       6      3         15          9
 7 James       8      4         18          8
 8 James       4      8         18         15
 9 James       5      9         17         21
10 James       3      0         12         17

我希望第6行和第7行在两个新列中显示NA,因为直到第8行为止,詹姆斯数据还不足以累加3行。

我该怎么做?

2 个答案:

答案 0 :(得分:2)

可能还加载了plyr,并且mutate中的plyr掩盖了mutate中的dplyr。我们可以使用dplyr::mutate

library(dplyr)
library(zoo)
df %>% 
 group_by(person) %>% 
 dplyr::mutate(s1_rolling = rollsumr(score1, k = 3, fill = NA),
     s2_rolling = rollsumr(score2, k = 3, fill = NA))
# A tibble: 10 x 5
# Groups:   person [2]
#   person score1 score2 s1_rolling s2_rolling
#   <chr>   <dbl>  <dbl>      <dbl>      <dbl>
# 1 Peter       1      1         NA         NA
# 2 Peter       3      1         NA         NA
# 3 Peter       2      1          6          3
# 4 Peter       5      5         10          7
# 5 Peter       4      1         11          7
# 6 James       6      3         NA         NA
# 7 James       8      4         NA         NA
# 8 James       4      8         18         15
# 9 James       5      9         17         21
#10 James       3      0         12         17

如果有不止一列,我们也可以使用across

df %>%
   group_by(person) %>%
   dplyr::mutate(across(starts_with('score'), 
       ~ rollsumr(., k = 3, fill = NA), .names = '{col}_rolling'))

要获得更快的版本,请使用RcppRoll::roll_sumr

df %>% 
    group_by(person) %>% 
    dplyr::mutate(across(starts_with('score'), 
       ~ RcppRoll::roll_sumr(., 3, fill = NA), .names = '{col}_rolling'))

可以用plyr::mutate来重现该行为

df %>% 
   group_by(person) %>% 
   plyr::mutate(s1_rolling = rollsumr(score1, k = 3, fill = NA),
          s2_rolling = rollsumr(score2, k = 3, fill = NA))
# A tibble: 10 x 5
# Groups:   person [2]
#   person score1 score2 s1_rolling s2_rolling
#   <chr>   <dbl>  <dbl>      <dbl>      <dbl>
# 1 Peter       1      1         NA         NA
# 2 Peter       3      1         NA         NA
# 3 Peter       2      1          6          3
# 4 Peter       5      5         10          7
# 5 Peter       4      1         11          7
# 6 James       6      3         15          9
# 7 James       8      4         18          8
# 8 James       4      8         18         15
# 9 James       5      9         17         21
#10 James       3      0         12         17

答案 1 :(得分:1)

我建议使用具有slider功能的slide_dbl()方法,其工作方式类似于zoo,并且与dplyr兼容:

library(slider)
library(dplyr)

#Code
# Build dataframe
df <- data.frame(person = c(rep("Peter", 5), rep("James", 5)),
                 score1 = c(1,3,2,5,4,6,8,4,5,3),
                 score2 = c(1,1,1,5,1,3,4,8,9,0))

# Attempt rolling sum by group
df %>% 
  group_by(person) %>% 
  mutate(s1_rolling = slide_dbl(score1, sum, .before = 2, .complete = TRUE),
         s2_rolling = slide_dbl(score2, sum, .before = 2, .complete = TRUE))

输出:

# A tibble: 10 x 5
# Groups:   person [2]
   person score1 score2 s1_rolling s2_rolling
   <fct>   <dbl>  <dbl>      <dbl>      <dbl>
 1 Peter       1      1         NA         NA
 2 Peter       3      1         NA         NA
 3 Peter       2      1          6          3
 4 Peter       5      5         10          7
 5 Peter       4      1         11          7
 6 James       6      3         NA         NA
 7 James       8      4         NA         NA
 8 James       4      8         18         15
 9 James       5      9         17         21
10 James       3      0         12         17