考虑以下虚拟数据集:
library(dplyr)
df <- structure(list(x = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 7L,
1L, 2L, 3L, 4L, 5L, 6L, 7L, 7L),
.Label = c("1", "2", "3", "4",
"5", "6", "Total"), class = "factor"),
y = structure(c(1L, 1L,
2L, 2L, 3L, 3L, 4L, 4L, 1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L),
.Label = c("7", "8", "9", "Total"), class = "factor"),
z = structure(c(1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L),
.Label = c("10", "11"), class = "factor"),
count = c(56, 89, 12, 119, 3, 2, 71,
210, 22, 64, 53, 0, 136, 11, 211, 75),
date = structure(c(17866,
17866, 17866, 17866, 17866, 17866, 17866, 17866, 17501, 17501,
17501, 17501, 17501, 17501, 17501, 17501), class = "Date")),
class = "data.frame",
row.names = c(NA, -16L),
.Names = c("x", "y", "z", "count", "date")) %>%
filter(count != 0)
> df
x y z count date
1 1 7 10 56 2018-12-01
2 2 7 11 89 2018-12-01
3 3 8 10 12 2018-12-01
4 4 8 11 119 2018-12-01
5 5 9 10 3 2018-12-01
6 6 9 11 2 2018-12-01
7 Total Total 10 71 2018-12-01
8 Total Total 11 210 2018-12-01
9 1 7 10 22 2017-12-01
10 2 7 11 64 2017-12-01
11 3 8 10 53 2017-12-01
12 5 9 10 136 2017-12-01
13 6 9 11 11 2017-12-01
14 Total Total 10 211 2017-12-01
15 Total Total 11 75 2017-12-01
我有兴趣通过略微修改来计算年度变化百分比。
这是未修改的版本(我不想要的,但已关闭):
df_yoy <- df %>%
group_by(x, y, z) %>%
summarize(YoY = count[date == max(date)]/count[date == min(date)] - 1) %>%
as.data.frame()
> df_yoy
x y z YoY
1 1 7 10 1.5454545
2 2 7 11 0.3906250
3 3 8 10 -0.7735849
4 4 8 11 0.0000000
5 5 9 10 -0.9779412
6 6 9 11 -0.8181818
7 Total Total 10 -0.6635071
8 Total Total 11 1.8000000 <-- obtained by doing 210/75-1
请注意我是如何专门召唤最后一行的。以下是我想要的要求:
count
值必须保持不变。count
上未衡量x == 4 & y == 8 & z == 11
时的2017-12-01
。因此,在计算总行的同比百分比变化时,需要在分子x == 4 & y == 8 & z == 11
中排除count[date == max(date)]
时的计数。因此,这里是我正在寻找的输出:
> df_yoy
x y z YoY
1 1 7 10 1.5454545
2 2 7 11 0.3906250
3 3 8 10 -0.7735849
4 4 8 11 0.0000000
5 5 9 10 -0.9779412
6 6 9 11 -0.8181818
7 Total Total 10 -0.6635071
8 Total Total 11 0.2133333 <-- obtained by doing (210-119)/75-1
请注意,在119
时,210
从count
减去x == 4 & y == 8 & z == 11
值。
有没有办法修改summarize()
来执行此更改?我已尝试使用ifelse()
和case_when()
,但没有成功。
答案 0 :(得分:1)
ungroup
可以访问解决方案并重新组合以使用dplyr
进行转换。
注意:解决方案可以用简洁的形式编写,但我选择以一种比较详细的方式编写,以便OP /读者更容易理解逻辑。
library(dplyr)
df %>% mutate(count = ifelse(count==0, NA, count)) %>%
group_by(x, y, z) %>%
summarize(YoYNume = count[date == max(date)], YoYDeno = count[date == min(date)]) %>%
group_by(z) %>%
mutate(valueToDiscard = sum(ifelse(is.na(YoYDeno),YoYNume,0))) %>%
mutate(YoYNume = ifelse(x=="Total", YoYNume - valueToDiscard, valueToDiscard)) %>%
group_by(x,y,z) %>%
summarise(YoY = YoYNume/YoYDeno - 1) %>%
as.data.frame()
# x y z YoY
# 1 1 7 10 -1.0000000
# 2 2 7 11 0.8593750
# 3 3 8 10 -1.0000000
# 4 4 8 11 NA
# 5 5 9 10 -1.0000000
# 6 6 9 11 9.8181818
# 7 Total Total 10 -0.6635071
# 8 Total Total 11 0.2133333