我想为我的数据框(df)计算每月的非累计小计。
"date" "id" "change"
2010-01-01 1 NA
2010-01-07 2 3
2010-01-15 2 -1
2010-02-01 1 NA
2010-02-04 2 7
2010-02-22 2 -2
2010-02-26 2 4
2010-03-01 1 NA
2010-03-14 2 -4
2010-04-01 1 NA
新时期从新月的第一天开始。列“ id”用作新周期开始(== 1)和某个周期内观察(== 2)的分组变量。目标是汇总一个月内的所有更改,然后在下一个周期从0重新开始。输出应存储在df的附加列中。
以下是我的数据框的可复制示例:
require(dplyr)
require(tidyr)
require(lubridate)
date <- ymd(c("2010-01-01","2010-01-07","2010-01-15","2010-02-01","2010-02-04","2010-02-22","2010-02-26","2010-03-01","2010-03-14","2010-04-01"))
df <- data.frame(date)
df$id <- as.numeric((c(1,2,2,1,2,2,2,1,2,1)))
df$change <- c(NA,3,-1,NA,7,-2,4,NA,-4,NA)
我试图做的事情:
df <- df %>%
group_by(id) %>%
mutate(total = cumsum(change)) %>%
ungroup() %>%
fill(total, .direction = "down") %>%
filter(id == 1)
这将导致以下输出:
"date" "id" "change" "total"
2010-01-01 1 NA NA
2010-02-01 1 NA 2
2010-03-01 1 NA 11
2010-04-01 1 NA 7
问题出在函数cumsum上,该函数累加了一个组中的所有先前值,并且在新的周期内不会从0重新开始。
所需的输出如下:
"date" "id" "change" "total"
2010-01-01 1 NA NA
2010-02-01 1 NA 2
2010-03-01 1 NA 9
2010-04-01 1 NA -4
“ id” == 1的行显示了所有前面所有“ id” == 2的更改的总和,每个周期都从0重新开始。是否存在针对此类问题的特定命令?任何人都可以提供上面代码的替代方法吗?
答案 0 :(得分:1)
我们可能还需要在分组变量中使用year-month
格式的“日期”来重置每个月
library(dplyr)
df %>%
group_by(id, grp = format(date, "%Y-%m")) %>%
mutate(total = cumsum(change)) %>%
ungroup() %>%
fill(total, .direction = "down") %>%
filter(id == 1) %>%
ungroup %>%
select(-grp)
# A tibble: 4 x 4
# date id change total
# <date> <dbl> <dbl> <dbl>
#1 2010-01-01 1 NA NA
#2 2010-02-01 1 NA 2
#3 2010-03-01 1 NA 9
#4 2010-04-01 1 NA -4