我的数据框有问题,个人ID相同,但超市,健康,汽车等各类费用不同。我的数据框是这样的:
Base=data.frame(ID=c("CED1","CED2","CED3","CED1","CED1","CED3","CED3","CED2","CED2","CED4"),Value=c(10,20,10,30,50,10,10,20,30,30),Categorie=c("Markets","Markets","Health","Cars","Cars","Health","Cars","Health","Cars","Markets"))
ID Value Categorie
1 CED1 10 Markets
2 CED2 20 Markets
3 CED3 10 Health
4 CED1 30 Cars
5 CED1 50 Cars
6 CED3 10 Health
7 CED3 10 Cars
8 CED2 20 Health
9 CED2 30 Cars
10 CED4 30 Markets
如何看待我有不同的ID和类别。我希望这个数据框中的指标是新的,并且有这样的人:
ID Total.Value Max.Value Min.Value Average.Value %Markets %Health %Cars
CED1 90 50 10 30 11% 0% 89%
CED2 70 30 20 23.33 28.5% 28.5% 42.8%
CED3 30 10 10 10 33.3% 33.3% 33.3%
CED4 30 30 30 30 100% 0% 0%
我正在尝试使用plyr开发此数据帧,但我没有得到正确的指标。谢谢你的帮助。
答案 0 :(得分:3)
这是一个ddply
解决方案。
library(plyr)
ddply(Base, .(ID), summarise, Total = sum(Value),
Max.Value = max(Value),
Min.Value = min(Value),
Average.Value = mean(Value),
"%Markets" = sum(Value[Categorie == "Markets"])/sum(Value) * 100,
"%Health" = sum(Value[Categorie == "Health"])/sum(Value) * 100,
"%Cars" = sum(Value[Categorie == "Cars"])/sum(Value) * 100)
结果:
ID Total Max.Value Min.Value Average.Value %Markets %Health %Cars
1 CED1 90 50 10 30.00000 11.11111 0.00000 88.88889
2 CED2 70 30 20 23.33333 28.57143 28.57143 42.85714
3 CED3 30 10 10 10.00000 0.00000 66.66667 33.33333
4 CED4 30 30 30 30.00000 100.00000 0.00000 0.00000
答案 1 :(得分:1)
这是一个data.table
解决方案:
require(data.table)
dt <- data.table(Base, key="ID")
dt[, as.list(c(total=sum(Value), max=max(Value),
min=min(Value), mean=mean(Value),
tapply(Value, Categorie, sum)/sum(Value) * 100)),
by=ID]
# ID total max min mean Cars Health Markets
# 1: CED1 90 50 10 30.00000 88.88889 NA 11.11111
# 2: CED2 70 30 20 23.33333 42.85714 28.57143 28.57143
# 3: CED3 30 10 10 10.00000 33.33333 66.66667 NA
# 4: CED4 30 30 30 30.00000 NA NA 100.00000
在这里,您可以将NA替换为0.如果您坚持直接获取0
而不是NA,那么:
dt[, {tt <- tapply(Value, Categorie, sum)/sum(Value); ## compute ratio for percentage
tt[is.na(tt)] <- 0;
as.list(c(total=sum(Value), ## total
summary(Value)[c(6,1,4)], ## max, min and mean
tt* 100)) ## percentages
},
by=ID]
# ID total Max. Min. Mean Cars Health Markets
# 1: CED1 90 50 10 30.00 88.88889 0.00000 11.11111
# 2: CED2 70 30 20 23.33 42.85714 28.57143 28.57143
# 3: CED3 30 10 10 10.00 33.33333 66.66667 0.00000
# 4: CED4 30 30 30 30.00 0.00000 0.00000 100.00000
在这里,我还展示了如何使用summary
函数来获取某些值,而不是逐个编写它们。