dplyr的总和

时间:2018-07-18 13:47:32

标签: r dplyr rolling-sum

set.seed(123)

df <- data.frame(x = sample(1:10, 20, replace = T), id = rep(1:2, each = 10))

对于每个id,我想创建一个包含前5个x值之和的列。

df %>% group_by(id) %>% mutate(roll.sum = c(x[1:4], zoo::rollapply(x, 5, sum)))
# Groups:   id [2]
  x    id roll.sum
<int> <int>    <int>
 3     1        3
 8     1        8
 5     1        5
 9     1        9
10     1       10
 1     1       36
 6     1       39
 9     1       40
 6     1       41
 5     1       37
10     2       10
 5     2        5
 7     2        7
 6     2        6
 2     2        2
 9     2       39
 3     2       32
 1     2       28
 4     2       25
10     2       29

第六行应为35 (3 + 8 + 5 + 9 + 10),第七行应为33 (8 + 5 + 9 + 10 + 1),依此类推。

但是,上面的函数还包括用于计算的行本身。我该如何解决?

2 个答案:

答案 0 :(得分:5)

 library(zoo)
 df %>%  group_by(id) %>%
       mutate(Sum_prev = rollapply(x, list(-(1:5)), sum, fill=NA, align = "right", partial=F))  

#you can use rollapply(x, list((1:5)), sum, fill=NA, align = "left", partial=F) 
#to sum the next 5 elements scaping the current one 


     x id Sum_prev
 1   3  1         NA
 2   8  1         NA
 3   5  1         NA
 4   9  1         NA
 5  10  1         NA
 6   1  1         35
 7   6  1         33
 8   9  1         31
 9   6  1         35
 10  5  1         32
 11 10  2         NA
 12  5  2         NA
 13  7  2         NA
 14  6  2         NA
 15  2  2         NA
 16  9  2         30
 17  3  2         29
 18  1  2         27
 19  4  2         21
 20 10  2         19

答案 1 :(得分:2)

您可以使用rollify包中的tibbletime函数。您可以在以下小插图中了解它:Rolling calculations in tibbletime

library(tibbletime)
library(dplyr)
rollig_sum <- rollify(.f = sum, window = 5)

df %>% 
  group_by(id) %>% 
  mutate(roll.sum = lag(rollig_sum(x))) #added lag() here
# A tibble: 20 x 3
# Groups:   id [2]
#       x    id roll.sum
#   <int> <int>    <int>
# 1     3     1       NA
# 2     8     1       NA
# 3     5     1       NA
# 4     9     1       NA
# 5    10     1       NA
# 6     1     1       35
# 7     6     1       33
# 8     9     1       31
# 9     6     1       35
#10     5     1       32
#11    10     2       NA
#12     5     2       NA
#13     7     2       NA
#14     6     2       NA
#15     2     2       NA
#16     9     2       30
#17     3     2       29
#18     1     2       27
#19     4     2       21
#20    10     2       19

如果您希望NA为其他值,则可以使用if_else

df %>% 
  group_by(id) %>% 
  mutate(roll.sum = lag(rollig_sum(x))) %>%
  mutate(roll.sum = if_else(is.na(roll.sum), x, roll.sum))