我有关于瑞士注册车辆的以下数据框架:
Fuel Year Region Count
Gasoline 2013 GE 169600
Diesel 2013 GE 46790
Hybrid 2013 GE 2268
Electric 2013 GE 85
Other 2013 GE 532
Gasoline 2013 VS 149232
Diesel 2013 VS 50591
Hybrid 2013 VS 1028
Electric 2013 VS 268
Other 2013 VS 261
我想在数据框中添加一个额外的“Pct”列,按年份和地区的燃料类型百分比,但我有一些困难要理解。结果应该是:
Fuel Year Region Count Pct
Gasoline 2013 GE 169600 0.7734
Diesel 2013 GE 46790 0.2134
Hybrid 2013 GE 2268 0.0134
Electric 2013 GE 85 0.0004
Other 2013 GE 532 0.0024
Gasoline 2013 VS 149232 0.7410
Diesel 2013 VS 50591 0.2512
Hybrid 2013 VS 1028 0.0051
Electric 2013 VS 268 0.0013
Other 2013 VS 261 0.0013
答案 0 :(得分:4)
这是ave
的一个很好的用例,然后是简单的向量分割:
# load your data
d <- read.table(text="Fuel Year Region Count
Gasoline 2013 GE 169600
Diesel 2013 GE 46790
Hybrid 2013 GE 2268
Electric 2013 GE 85
Other 2013 GE 532
Gasoline 2013 VS 149232
Diesel 2013 VS 50591
Hybrid 2013 VS 1028
Electric 2013 VS 268
Other 2013 VS 261", header = TRUE)
# `ave` by groups and divide
d$Pct <- d$Count/with(d, ave(Count, list(Year, Region), FUN = sum))
# or, equivalently:
# d <- within(d, Pct <- Count/ave(Count, list(Year, Region), FUN = sum))
结果:
> d
Fuel Year Region Count Pct
1 Gasoline 2013 GE 169600 0.7734579865
2 Diesel 2013 GE 46790 0.2133850188
3 Hybrid 2013 GE 2268 0.0103431764
4 Electric 2013 GE 85 0.0003876411
5 Other 2013 GE 532 0.0024261772
6 Gasoline 2013 VS 149232 0.7410467772
7 Diesel 2013 VS 50591 0.2512215712
8 Hybrid 2013 VS 1028 0.0051047770
9 Electric 2013 VS 268 0.0013308174
10 Other 2013 VS 261 0.0012960572
答案 1 :(得分:2)
使用data.table
。如果d
是数据集
library(data.table)
setDT(d)[,Pct:=round(Count/sum(Count),4), by=list(Year, Region)]
d
# Fuel Year Region Count Pct
# 1: Gasoline 2013 GE 169600 0.7735
# 2: Diesel 2013 GE 46790 0.2134
# 3: Hybrid 2013 GE 2268 0.0103
# 4: Electric 2013 GE 85 0.0004
# 5: Other 2013 GE 532 0.0024
# 6: Gasoline 2013 VS 149232 0.7410
# 7: Diesel 2013 VS 50591 0.2512
# 8: Hybrid 2013 VS 1028 0.0051
# 9: Electric 2013 VS 268 0.0013
# 10: Other 2013 VS 261 0.0013
答案 2 :(得分:0)
使用dplyr软件包,该软件包自从最初提出这个问题以来就开始飙升:
library(dplyr)
d %>%
group_by(Year, Region) %>%
mutate(Pct = Count / sum(Count))
# A tibble: 10 x 5
# Groups: Year, Region [2]
Fuel Year Region Count Pct
<fct> <int> <fct> <int> <dbl>
1 Gasoline 2013 GE 169600 0.773
2 Diesel 2013 GE 46790 0.213
3 Hybrid 2013 GE 2268 0.0103
4 Electric 2013 GE 85 0.000388
5 Other 2013 GE 532 0.00243
6 Gasoline 2013 VS 149232 0.741
7 Diesel 2013 VS 50591 0.251
8 Hybrid 2013 VS 1028 0.00510
9 Electric 2013 VS 268 0.00133
10 Other 2013 VS 261 0.00130