按R中另一列中的类别汇总一列的百分比

时间:2018-05-01 22:01:53

标签: r dplyr aggregate plyr

我知道这是基本的,但我遇到了问题。我从以下网站获取了这些样本数据:

Link to article containing sample data

companiesData <- data.frame(fy = c(2010,2011,2012,2010,2011,2012,2010,2011,2012),
                            company = c("Apple","Apple","Apple","Google","Google","Google",
                                        "Microsoft","Microsoft","Microsoft"),
                            revenue = c(65225,108249,156508,29321,37905,50175,
                                        62484,69943,73723), 
                            profit = c(14013,25922,41733,8505,9737,10737,
                                       18760,23150,16978))

我如何找到每家公司每年的利润百分比?一个例子是为Apple添加所有利润,然后根据需要为每个苹果行添加此总和的百分比。最终结果应该是包含所有列的表,但只能由公司使用百分比利润进行汇总。岁月保持不变。 第一排Apple的答案为17.16%,计算方法如下:

(14013/81668)*100

其中81668是苹果的总数,17.16%是苹果第一行的利润百分比是2010年。我不希望这样做是作为一个时间序列,因为变量可能不一定是时间。它可能是位置。

2 个答案:

答案 0 :(得分:1)

使用基数r:

fun=function(x)paste0(round(x/sum(x)*100,2),"%")
transform(companiesData,prec=ave(profit,company,FUN=fun))
    fy   company revenue profit   prec
1 2010     Apple   65225  14013 17.16%
2 2011     Apple  108249  25922 31.74%
3 2012     Apple  156508  41733  51.1%
4 2010    Google   29321   8505 29.35%
5 2011    Google   37905   9737  33.6%
6 2012    Google   50175  10737 37.05%
7 2010 Microsoft   62484  18760 31.86%
8 2011 Microsoft   69943  23150 39.31%
9 2012 Microsoft   73723  16978 28.83%


library(data.table)
setDT(companiesData)[,prec:=profit/sum(profit)*100,by=company][]
     fy   company revenue profit     prec
1: 2010     Apple   65225  14013 17.15850
2: 2011     Apple  108249  25922 31.74071
3: 2012     Apple  156508  41733 51.10080
4: 2010    Google   29321   8505 29.34884
5: 2011    Google   37905   9737 33.60019
6: 2012    Google   50175  10737 37.05097
7: 2010 Microsoft   62484  18760 31.85708
8: 2011 Microsoft   69943  23150 39.31191
9: 2012 Microsoft   73723  16978 28.83100

答案 1 :(得分:1)

dplyr解决方案:按公司分组,将公司的所有利润加起来,然后创建每年利润占总利润的新变量。

library(dplyr)

# delete reading in data from OP

companiesData %>%
    group_by(company) %>%
    mutate(total_profit = sum(profit)) %>%
    mutate(share_this_yr = profit / total_profit)
#> # A tibble: 9 x 6
#> # Groups:   company [3]
#>      fy company   revenue profit total_profit share_this_yr
#>   <dbl> <fct>       <dbl>  <dbl>        <dbl>         <dbl>
#> 1  2010 Apple       65225  14013        81668         0.172
#> 2  2011 Apple      108249  25922        81668         0.317
#> 3  2012 Apple      156508  41733        81668         0.511
#> 4  2010 Google      29321   8505        28979         0.293
#> 5  2011 Google      37905   9737        28979         0.336
#> 6  2012 Google      50175  10737        28979         0.371
#> 7  2010 Microsoft   62484  18760        58888         0.319
#> 8  2011 Microsoft   69943  23150        58888         0.393
#> 9  2012 Microsoft   73723  16978        58888         0.288

reprex package(v0.2.0)创建于2018-05-01。