我的数据如下:
library(tidyverse)
Date <- c(rep("5/22/19", 3), rep("5/23/19", 3), rep("5/24/19", 3))
Source <- rep(c("Control", "A", "B"), 3)
ValueA <- c(12080, 12012, 11944, 13345, 13342, 13422, 16226, 16045, 16221)
ValueB <- c(11, 9, 13, 11, 9, 7, 12, 9, 15)
df <- tibble(Date, Source, ValueA, ValueB)
df
# A tibble: 9 x 4
Date Source ValueA ValueB
<chr> <chr> <dbl> <dbl>
1 5/22/19 Control 12080 11
2 5/22/19 A 12012 9
3 5/22/19 B 11944 13
4 5/23/19 Control 13345 11
5 5/23/19 A 13342 9
6 5/23/19 B 13422 7
7 5/24/19 Control 16226 12
8 5/24/19 A 16045 9
9 5/24/19 B 16221 15
我想要的是Date
和 Source
的累积总和。因此输出如下所示:
Date Source ValueA ValueB
1 5/22/19 Control 12080 11
2 5/22/19 A 12012 9
3 5/22/19 B 11944 13
4 5/23/19 Control 25425 22
5 5/23/19 A 25354 18
6 5/23/19 B 25366 20
7 5/24/19 Control 41651 34
8 5/24/19 A 41399 27
9 5/24/19 B 41587 35
但是,当我使用此代码时:
df <- df %>%
group_by(Date, Source) %>%
summarize(
ValueA = sum(ValueA, na.rm = TRUE),
ValueB = sum(ValueB, na.rm = TRUE),
Cum_A = cumsum(ValueA, na.rm = TRUE),
Cum_B = cumsum(ValueB, na.rm = TRUE)
)
我得到了错误
Error in cumsum(ValueA, na.rm = TRUE) :
2 arguments passed to 'cumsum' which requires 1
我假设cumsum
函数并非旨在处理多个分组变量。那么如何获得想要的结果?
答案 0 :(得分:1)
我认为您只需要使用group_by(Source)
。看看这是否满足您的需求。
一些注意事项:
Source
保留在arrange()
中是可选的;删除它会重新创建您上面要求的数据。我加入了它,因此cumsum()
的结果更加明显Source
或Date
),不需要进行汇总,mutate()
可以解决问题cumsum()
不接受na.rm
参数,但您可以用replace_na()
代替0 df <-
tibble(
Date = rep(c("5/22/19", "5/23/19", "5/24/19"), each = 3),
Source = rep(c("Control", "A", "B"), 3),
ValueA = c(12080, 12012, 11944, 13345, 13342, 13422, 16226, 16045, 16221),
ValueB = c(11, 9, 13, NA, 9, 7, 12, 9, 15)
)
df %>%
arrange(Source, Date) %>%
group_by(Source) %>%
mutate(
Cum_A = cumsum(replace_na(ValueA, 0)),
Cum_B = cumsum(replace_na(ValueB, 0))
) %>%
ungroup()
# Date Source ValueA ValueB Cum_A Cum_B
# 5/22/19 A 12012 9 12012 9
# 5/23/19 A 13342 9 25354 18
# 5/24/19 A 16045 9 41399 27
# -----------------------------------------
# 5/22/19 B 11944 13 11944 13
# 5/23/19 B 13422 7 25366 20
# 5/24/19 B 16221 15 41587 35
# -----------------------------------------
# 5/22/19 Control 12080 11 12080 11
# 5/23/19 Control 13345 NA 25425 11
# 5/24/19 Control 16226 12 41651 34
答案 1 :(得分:0)
尝试使用cumsum
(来自summarize
)来代替tally()
或dplyr
:
df = df %>%
group_by_(.dots=c("Date","Source","ValueA","ValueB")) %>%
tally() %>%
select(-n)
此方法将按照变量Source的升序对输出进行排序,但是从这一点来看,以您喜欢的任何格式对数据进行排序应该相当简单。