按列值计算百分比

时间:2014-08-05 08:31:33

标签: r

我有关于瑞士注册车辆的以下数据框架:

Fuel     Year   Region   Count
Gasoline 2013       GE  169600
Diesel   2013       GE   46790
Hybrid   2013       GE    2268
Electric 2013       GE      85
Other    2013       GE     532
Gasoline 2013       VS  149232
Diesel   2013       VS   50591
Hybrid   2013       VS    1028
Electric 2013       VS     268
Other    2013       VS     261

我想在数据框中添加一个额外的“Pct”列,按年份和地区的燃料类型百分比,但我有一些困难要理解。结果应该是:

Fuel     Year   Region   Count  Pct
Gasoline 2013       GE  169600  0.7734
Diesel   2013       GE   46790  0.2134
Hybrid   2013       GE    2268  0.0134
Electric 2013       GE      85  0.0004
Other    2013       GE     532  0.0024
Gasoline 2013       VS  149232  0.7410
Diesel   2013       VS   50591  0.2512
Hybrid   2013       VS    1028  0.0051
Electric 2013       VS     268  0.0013
Other    2013       VS     261  0.0013

3 个答案:

答案 0 :(得分:4)

这是ave的一个很好的用例,然后是简单的向量分割:

# load your data
d <- read.table(text="Fuel     Year   Region   Count
Gasoline 2013       GE  169600
Diesel   2013       GE   46790
Hybrid   2013       GE    2268
Electric 2013       GE      85
Other    2013       GE     532
Gasoline 2013       VS  149232
Diesel   2013       VS   50591
Hybrid   2013       VS    1028
Electric 2013       VS     268
Other    2013       VS     261", header = TRUE)

# `ave` by groups and divide
d$Pct <- d$Count/with(d, ave(Count, list(Year, Region), FUN = sum))
# or, equivalently:
# d <- within(d, Pct <- Count/ave(Count, list(Year, Region), FUN = sum))

结果:

> d
       Fuel Year Region  Count          Pct
1  Gasoline 2013     GE 169600 0.7734579865
2    Diesel 2013     GE  46790 0.2133850188
3    Hybrid 2013     GE   2268 0.0103431764
4  Electric 2013     GE     85 0.0003876411
5     Other 2013     GE    532 0.0024261772
6  Gasoline 2013     VS 149232 0.7410467772
7    Diesel 2013     VS  50591 0.2512215712
8    Hybrid 2013     VS   1028 0.0051047770
9  Electric 2013     VS    268 0.0013308174
10    Other 2013     VS    261 0.0012960572

答案 1 :(得分:2)

使用data.table。如果d是数据集

 library(data.table)
 setDT(d)[,Pct:=round(Count/sum(Count),4), by=list(Year, Region)]
 d
#         Fuel Year Region  Count    Pct
#  1: Gasoline 2013     GE 169600 0.7735
#  2:   Diesel 2013     GE  46790 0.2134
#  3:   Hybrid 2013     GE   2268 0.0103
#  4: Electric 2013     GE     85 0.0004
#  5:    Other 2013     GE    532 0.0024
#  6: Gasoline 2013     VS 149232 0.7410
#  7:   Diesel 2013     VS  50591 0.2512
#  8:   Hybrid 2013     VS   1028 0.0051
#  9: Electric 2013     VS    268 0.0013
# 10:    Other 2013     VS    261 0.0013

答案 2 :(得分:0)

使用dplyr软件包,该软件包自从最初提出这个问题以来就开始飙升:

library(dplyr)
d %>%
  group_by(Year, Region) %>%
  mutate(Pct = Count / sum(Count))

# A tibble: 10 x 5
# Groups:   Year, Region [2]
   Fuel      Year Region  Count      Pct
   <fct>    <int> <fct>   <int>    <dbl>
 1 Gasoline  2013 GE     169600 0.773   
 2 Diesel    2013 GE      46790 0.213   
 3 Hybrid    2013 GE       2268 0.0103  
 4 Electric  2013 GE         85 0.000388
 5 Other     2013 GE        532 0.00243 
 6 Gasoline  2013 VS     149232 0.741   
 7 Diesel    2013 VS      50591 0.251   
 8 Hybrid    2013 VS       1028 0.00510 
 9 Electric  2013 VS        268 0.00133 
10 Other     2013 VS        261 0.00130