我正在尝试将数据框转换为按子类别划分的年度总计和百分比细分的摘要数据。例如,如果我有这些数据:
name year prod_type prod_color revenue
a 2012 car red 1000
b 2012 car blue 2000
c 2012 boat red 4000
d 2012 plane blue 5000
a 2014 boat green 9000
b 2014 car red 2000
c 2014 plane blue 6000
a 2014 plane blue 10000
我想创建一个如下所示的表:
name year yr_total_rev pct_car_rev pct_boat_rev pct_plane_rev pct_red_car_rev pct_blue_car_rev
1 a 2012 1000 NA NA NA NA NA
2 a 2014 19000 NA NA NA NA NA
3 b 2012 2000 NA NA NA NA NA
4 b 2014 2000 NA NA NA NA NA
5 c 2012 4000 NA NA NA NA NA
6 c 2014 6000 NA NA NA NA NA
7 d 2012 5000 NA NA NA NA NA
除了NA之外,还有" yr_total_rev"的百分比。每个名字/年份对 - 即。对于2012年,汽车收入将是100%,但在2014年,它将为0%,而船和飞机的收入将为50%等。
提前感谢您提供任何帮助!
以下示例数据:
df <- data.frame("name"=c(letters[1:4], c(letters[1:3], "a")),
"year"=c(rep(2012,4), rep(2014, 4)),
"prod_type"=c("car","car","boat","plane","boat","car","plane","plane"),
"prod_color"=c("red","blue","red","blue","green","red","blue","blue"),
"revenue"=c(1000,2000,4000,5000,9000,2000,6000, 10000))
答案 0 :(得分:3)
我在下面的代码中加入了三个单独的摘要:
library(tidyverse)
dat.summary = df %>% group_by(name, year) %>%
summarise(yr_total=sum(revenue)) %>%
left_join(df %>% group_by(name, year, prod_type) %>%
summarise(rev=sum(revenue)) %>%
group_by(name, year) %>%
mutate(Percent=rev/sum(rev)) %>%
select(-rev) %>%
spread(prod_type, Percent)) %>%
left_join(df %>% group_by(name, year, prod_type, prod_color) %>%
summarise(rev=sum(revenue)) %>%
group_by(name, year) %>%
mutate(Percent=rev/sum(rev)) %>%
unite(type_color, prod_type, prod_color) %>%
select(-rev) %>%
spread(type_color, Percent))
name year yr_total boat car plane boat_green boat_red car_blue car_red plane_blue 1 a 2012 1000 NA 1 NA NA NA NA 1 NA 2 a 2014 19000 0.4736842 NA 0.5263158 0.4736842 NA NA NA 0.5263158 3 b 2012 2000 NA 1 NA NA NA 1 NA NA 4 b 2014 2000 NA 1 NA NA NA NA 1 NA 5 c 2012 4000 1.0000000 NA NA NA 1 NA NA NA 6 c 2014 6000 NA NA 1.0000000 NA NA NA NA 1.0000000 7 d 2012 5000 NA NA 1.0000000 NA NA NA NA 1.0000000
通过编写函数可以缩短一点:
fnc = function(...) {
df %>% group_by(!!!quos(...)) %>%
summarise(rev=sum(revenue)) %>%
group_by(name, year) %>%
mutate(Percent=rev/sum(rev))
}
dat.summary = fnc(name, year) %>% select(-Percent) %>%
left_join(fnc(name, year, prod_type) %>%
select(-rev) %>%
spread(prod_type, Percent)) %>%
left_join(fnc(name, year, prod_type, prod_color) %>%
unite(type_color, prod_type, prod_color) %>%
select(-rev) %>%
spread(type_color, Percent))