按子类别划分的年度小组总数百分比

时间:2017-10-23 01:18:03

标签: r dplyr data.table plyr reshape2

我正在尝试将数据框转换为按子类别划分的年度总计和百分比细分的摘要数据。例如,如果我有这些数据:

name year prod_type prod_color revenue
    a 2012       car        red    1000
    b 2012       car       blue    2000
    c 2012      boat        red    4000
    d 2012     plane       blue    5000
    a 2014      boat      green    9000
    b 2014       car        red    2000
    c 2014     plane       blue    6000
    a 2014     plane       blue   10000

我想创建一个如下所示的表:

 name year yr_total_rev pct_car_rev pct_boat_rev pct_plane_rev pct_red_car_rev pct_blue_car_rev
1    a 2012         1000          NA           NA            NA              NA               NA
2    a 2014        19000          NA           NA            NA              NA               NA
3    b 2012         2000          NA           NA            NA              NA               NA
4    b 2014         2000          NA           NA            NA              NA               NA
5    c 2012         4000          NA           NA            NA              NA               NA
6    c 2014         6000          NA           NA            NA              NA               NA
7    d 2012         5000          NA           NA            NA              NA               NA

除了NA之外,还有" yr_total_rev"的百分比。每个名字/年份对 - 即。对于2012年,汽车收入将是100%,但在2014年,它将为0%,而船和飞机的收入将为50%等。

提前感谢您提供任何帮助!

以下示例数据:

df <- data.frame("name"=c(letters[1:4], c(letters[1:3], "a")), 
                 "year"=c(rep(2012,4), rep(2014, 4)),
                 "prod_type"=c("car","car","boat","plane","boat","car","plane","plane"),
                          "prod_color"=c("red","blue","red","blue","green","red","blue","blue"),
                 "revenue"=c(1000,2000,4000,5000,9000,2000,6000, 10000))

1 个答案:

答案 0 :(得分:3)

我在下面的代码中加入了三个单独的摘要:

library(tidyverse)

dat.summary = df %>% group_by(name, year) %>% 
  summarise(yr_total=sum(revenue)) %>% 
  left_join(df %>% group_by(name, year, prod_type) %>% 
      summarise(rev=sum(revenue)) %>% 
      group_by(name, year) %>% 
      mutate(Percent=rev/sum(rev)) %>%
      select(-rev) %>% 
      spread(prod_type, Percent)) %>% 
  left_join(df %>% group_by(name, year, prod_type, prod_color) %>% 
      summarise(rev=sum(revenue)) %>% 
      group_by(name, year) %>% 
      mutate(Percent=rev/sum(rev)) %>%
      unite(type_color, prod_type, prod_color) %>% 
      select(-rev) %>% 
      spread(type_color, Percent))
    name  year yr_total      boat   car     plane boat_green boat_red car_blue car_red plane_blue
1      a  2012     1000        NA     1        NA         NA       NA       NA       1         NA
2      a  2014    19000 0.4736842    NA 0.5263158  0.4736842       NA       NA      NA  0.5263158
3      b  2012     2000        NA     1        NA         NA       NA        1      NA         NA
4      b  2014     2000        NA     1        NA         NA       NA       NA       1         NA
5      c  2012     4000 1.0000000    NA        NA         NA        1       NA      NA         NA
6      c  2014     6000        NA    NA 1.0000000         NA       NA       NA      NA  1.0000000
7      d  2012     5000        NA    NA 1.0000000         NA       NA       NA      NA  1.0000000

通过编写函数可以缩短一点:

fnc = function(...) {
  df %>% group_by(!!!quos(...)) %>% 
    summarise(rev=sum(revenue)) %>% 
    group_by(name, year) %>% 
    mutate(Percent=rev/sum(rev))
}

dat.summary = fnc(name, year) %>% select(-Percent) %>% 
  left_join(fnc(name, year, prod_type) %>%
              select(-rev) %>% 
              spread(prod_type, Percent)) %>% 
  left_join(fnc(name, year, prod_type, prod_color) %>%
              unite(type_color, prod_type, prod_color) %>% 
              select(-rev) %>% 
              spread(type_color, Percent))