我有一个R类学生每周津贴的数据集,类似于:
Year ID Class Allowance
2013 123 Freshman 100
2013 234 Freshman 110
2013 345 Sophomore 150
2013 456 Sophomore 200
2013 567 Junior 250
2014 678 Junior 100
2014 789 Junior 230
2014 890 Freshman 110
2014 891 Freshman 250
2014 892 Sophomore 220
如何按组(年/班)汇总结果以获得总和和%(按组)?使用ddply
获得总结似乎很容易,因为无法获得按组分组的权利。
适用于sum
:
summary <- ddply(my_data, .(Year, Class), summarize, Sum_Allow=sum(Allowance))
但它不适用于按部分分组的百分比:
summary <- ddply(my_data, .(Year, Class), summarize, Sum_Allow=sum(Allowance),
Allow_Pct=Allowance/sum(Allowance))
理想的结果应该如下:
Year Class Sum_Allow Allow_Pct
2013 Freshman 210 26%
2013 Junior 250 31%
2013 Sophomore 350 43%
2014 Freshman 360 40%
2014 Junior 330 36%
2014 Sophomore 220 24%
我尝试了plyr软件包中的ddply,但请告诉我这可能有用的方法。
答案 0 :(得分:7)
以下是使用data.table
包的可能解决方案(假设您的数据名为df
)
library(data.table)
setDT(df)[, list(Sum_Allow = sum(Allowance)), keyby = list(Year, Class)][,
Allow_Pct := paste0(round(Sum_Allow/sum(Sum_Allow), 2)*100, "%"), by = Year][]
# Year Class Sum_Allow Allow_Pct
# 1: 2013 Freshman 210 26%
# 2: 2013 Junior 250 31%
# 3: 2013 Sophomore 350 43%
# 4: 2014 Freshman 360 40%
# 5: 2014 Junior 330 36%
# 6: 2014 Sophomore 220 24%
贡献给@rawr,这是一个可能的基础R解决方案
df2 <- aggregate(Allowance ~ Class + Year, df, sum)
transform(df2, Allow_pct = ave(Allowance, Year, FUN = function(x) paste0(round(x/sum(x), 2)*100, "%")))
# Class Year Allowance Allow_pct
# 1 Freshman 2013 210 26%
# 2 Junior 2013 250 31%
# 3 Sophomore 2013 350 43%
# 4 Freshman 2014 360 40%
# 5 Junior 2014 330 36%
# 6 Sophomore 2014 220 24%
答案 1 :(得分:4)
您可以分两步完成此操作
my_data <- read.table(header = TRUE,
text = "Year ID Class Allowance
2013 123 Freshman 100
2013 234 Freshman 110
2013 345 Sophomore 150
2013 456 Sophomore 200
2013 567 Junior 250
2014 678 Junior 100
2014 789 Junior 230
2014 890 Freshman 110
2014 891 Freshman 250
2014 892 Sophomore 220")
library(plyr)
(summ <- ddply(my_data, .(Year, Class), summarize, Sum_Allow=sum(Allowance)))
# Year Class Sum_Allow
# 1 2013 Freshman 210
# 2 2013 Junior 250
# 3 2013 Sophomore 350
# 4 2014 Freshman 360
# 5 2014 Junior 330
# 6 2014 Sophomore 220
ddply(summ, .(Year), mutate, Allow_pct = Sum_Allow / sum(Sum_Allow) * 100)
# Year Class Sum_Allow Allow_pct
# 1 2013 Freshman 210 25.92593
# 2 2013 Junior 250 30.86420
# 3 2013 Sophomore 350 43.20988
# 4 2014 Freshman 360 39.56044
# 5 2014 Junior 330 36.26374
# 6 2014 Sophomore 220 24.17582
我不知道你们其他人是否会发生这种情况,但是当我进行原始尝试时,R会崩溃而不是发出警告。或者,如果我拼错而不是允许,它会崩溃。我真讨厌那个; hadley请修复
永远基地
答案 2 :(得分:3)
所以假设你想要的是:
然后这可以在dplyr中做到这一点:
library(dplyr)
my_data <- read.table(header = TRUE,
text =
'Year ID Class Allowance
2013 123 Freshman 100
2013 234 Freshman 110
2013 345 Sophomore 150
2013 456 Sophomore 200
2013 567 Junior 250
2014 678 Junior 100
2014 789 Junior 230
2014 890 Freshman 110
2014 891 Freshman 250
2014 892 Sophomore 220')
summary <- my_data %>%
group_by(Year) %>%
summarise(Year_Sum_Allow = sum(Allowance)) %>%
left_join(x = my_data, y = ., by = 'Year') %>%
group_by(Year, Class) %>%
summarise(Sum_Allow = sum(Allowance),
Allow_Pct = Sum_Allow/first(Year_Sum_Allow))
summary
# Results
Source: local data frame [6 x 4]
Groups: Year
Year Class Sum_Allow Allow_Pct
1 2013 Freshman 210 0.2592593
2 2013 Junior 250 0.3086420
3 2013 Sophomore 350 0.4320988
4 2014 Freshman 360 0.3956044
5 2014 Junior 330 0.3626374
6 2014 Sophomore 220 0.2417582
如果您不熟悉dplyr,语法可能看起来很奇怪。我建议看看introduction。这节省了很多时间。
编辑:我应该补充一点,如果你想在示例输出中使用漂亮的百分比格式,你可以在最后一行替换Allow_Pct = paste0(round(Sum_Allow/first(Year_Sum_Allow), 2), '%')
。
编辑2:正如jbaums指出的那样,这可以简化为:
my_data %>%
group_by(Year, Class) %>%
summarise(sum_allow=sum(Allowance)) %>%
mutate(pct_allow=sum_allow/sum(sum_allow))