我想计算3个先前值的滚动总和(或自定义函数),分别对待每个组。我已经尝试过了:
require(dplyr)
# Build dataframe
df <- data.frame(person = c(rep("Peter", 5), rep("James", 5)),
score1 = c(1,3,2,5,4,6,8,4,5,3),
score2 = c(1,1,1,5,1,3,4,8,9,0))
# Attempt rolling sum by group
df %>%
group_by(person) %>%
mutate(s1_rolling = rollsumr(score1, k = 3, fill = NA),
s2_rolling = rollsumr(score2, k = 3, fill = NA))
但是新列不会将每个组分开对待,而是继续处理整个数据集:
person score1 score2 s1_rolling s2_rolling
<chr> <dbl> <dbl> <dbl> <dbl>
1 Peter 1 1 NA NA
2 Peter 3 1 NA NA
3 Peter 2 1 6 3
4 Peter 5 5 10 7
5 Peter 4 1 11 7
6 James 6 3 15 9
7 James 8 4 18 8
8 James 4 8 18 15
9 James 5 9 17 21
10 James 3 0 12 17
我希望第6行和第7行在两个新列中显示NA,因为直到第8行为止,詹姆斯数据还不足以累加3行。
我该怎么做?
答案 0 :(得分:2)
可能还加载了plyr
,并且mutate
中的plyr
掩盖了mutate
中的dplyr
。我们可以使用dplyr::mutate
library(dplyr)
library(zoo)
df %>%
group_by(person) %>%
dplyr::mutate(s1_rolling = rollsumr(score1, k = 3, fill = NA),
s2_rolling = rollsumr(score2, k = 3, fill = NA))
# A tibble: 10 x 5
# Groups: person [2]
# person score1 score2 s1_rolling s2_rolling
# <chr> <dbl> <dbl> <dbl> <dbl>
# 1 Peter 1 1 NA NA
# 2 Peter 3 1 NA NA
# 3 Peter 2 1 6 3
# 4 Peter 5 5 10 7
# 5 Peter 4 1 11 7
# 6 James 6 3 NA NA
# 7 James 8 4 NA NA
# 8 James 4 8 18 15
# 9 James 5 9 17 21
#10 James 3 0 12 17
如果有不止一列,我们也可以使用across
df %>%
group_by(person) %>%
dplyr::mutate(across(starts_with('score'),
~ rollsumr(., k = 3, fill = NA), .names = '{col}_rolling'))
要获得更快的版本,请使用RcppRoll::roll_sumr
df %>%
group_by(person) %>%
dplyr::mutate(across(starts_with('score'),
~ RcppRoll::roll_sumr(., 3, fill = NA), .names = '{col}_rolling'))
可以用plyr::mutate
来重现该行为
df %>%
group_by(person) %>%
plyr::mutate(s1_rolling = rollsumr(score1, k = 3, fill = NA),
s2_rolling = rollsumr(score2, k = 3, fill = NA))
# A tibble: 10 x 5
# Groups: person [2]
# person score1 score2 s1_rolling s2_rolling
# <chr> <dbl> <dbl> <dbl> <dbl>
# 1 Peter 1 1 NA NA
# 2 Peter 3 1 NA NA
# 3 Peter 2 1 6 3
# 4 Peter 5 5 10 7
# 5 Peter 4 1 11 7
# 6 James 6 3 15 9
# 7 James 8 4 18 8
# 8 James 4 8 18 15
# 9 James 5 9 17 21
#10 James 3 0 12 17
答案 1 :(得分:1)
我建议使用具有slider
功能的slide_dbl()
方法,其工作方式类似于zoo
,并且与dplyr
兼容:
library(slider)
library(dplyr)
#Code
# Build dataframe
df <- data.frame(person = c(rep("Peter", 5), rep("James", 5)),
score1 = c(1,3,2,5,4,6,8,4,5,3),
score2 = c(1,1,1,5,1,3,4,8,9,0))
# Attempt rolling sum by group
df %>%
group_by(person) %>%
mutate(s1_rolling = slide_dbl(score1, sum, .before = 2, .complete = TRUE),
s2_rolling = slide_dbl(score2, sum, .before = 2, .complete = TRUE))
输出:
# A tibble: 10 x 5
# Groups: person [2]
person score1 score2 s1_rolling s2_rolling
<fct> <dbl> <dbl> <dbl> <dbl>
1 Peter 1 1 NA NA
2 Peter 3 1 NA NA
3 Peter 2 1 6 3
4 Peter 5 5 10 7
5 Peter 4 1 11 7
6 James 6 3 NA NA
7 James 8 4 NA NA
8 James 4 8 18 15
9 James 5 9 17 21
10 James 3 0 12 17