R通过mutliple标准聚合数据列

时间:2016-02-17 14:42:34

标签: r aggregate

我正在尝试按两个类别汇总数据。

数据

year    person  expense sex money
2011    kevin   truck   M   1
2011    mike    truck   M   62
2011    sally   truck   F   60
2012    kevin   truck   M   37
2012    mike    truck   M   53
2012    sally   truck   F   95
2013    kevin   truck   M   21
2013    mike    truck   M   13
2013    sally   truck   F   38
2014    kevin   truck   M   48
2014    mike    truck   M   4
2014    sally   truck   F   77
2011    kevin   house   M   7
2011    mike    house   M   94
2011    sally   house   F   79
2012    kevin   house   M   86
2012    mike    house   M   42
2012    sally   house   F   46
2013    kevin   house   M   90
2013    mike    house   M   76
2013    sally   house   F   75
2014    kevin   house   M   70
2014    mike    house   M   91
2014    sally   house   F   62

如果年份和人物列匹配,我想要汇总金额列。

所需的输出

year    person  sex money
2011    kevin   M   8
2011    mike    M   156
2011    sally   F   139
2012    kevin   M   123
2012    mike    M   95
2012    sally   F   141
2013    kevin   M   111
2013    mike    M   89
2013    sally   F   113
2014    kevin   M   118
2014    mike    M   95
2014    sally   F   139

我该怎么做?

我尝试了data_aggregated = aggregate(data$money, by = list(name = data$name), FUN = sum),但只是按名称汇总了我的数据。我不知道如何用名字和年份来汇总它。它还会创建一个只包含两个变量列的数据框:namex

我还尝试将名称+年连接到ID变量中,但这似乎太乏味了。

2 个答案:

答案 0 :(得分:4)

您可以从下面的公式路线或列表方法中进行选择。如果您还想要性别列,则可以像其他列一样添加:

aggregate(money ~ person + year + sex, data, sum)

或者用你的方法:

aggregate(data$money, by = list(person=data$person, year=data$year, sex=data$sex), FUN=sum)

包方法是:

library(dplyr)
data %>% group_by(person, year, sex) %>% summarise(mon_sum=sum(money))
# Source: local data table [12 x 4]
# Groups: person, year
# 
# person  year    sex mon_sum
# (fctr) (int) (fctr)   (int)
# 1   kevin  2011      M       8
# 2    mike  2011      M     156
# 3   sally  2011      F     139
# 4   kevin  2012      M     123
# 5    mike  2012      M      95
# 6   sally  2012      F     141
# 7   kevin  2013      M     111
# 8    mike  2013      M      89
# 9   sally  2013      F     113
# 10  kevin  2014      M     118
# 11   mike  2014      M      95
# 12  sally  2014      F     139

data.table。这种方法在性能和编程时间方面证明是最有效的,值得学习:

library(data.table)
setDT(data)[,sum(money), by=.(person,year, sex)]

答案 1 :(得分:2)

使用dplyr

非常容易
library(dplyr)
df %>% group_by(year, person, sex) %>% summarise(money = sum(money))

返回

Source: local data frame [12 x 4]
Groups: year [?]

    year person    sex money
   (int) (fctr) (fctr) (int)
1   2011  kevin      M     8
2   2011   mike      M   156
3   2011  sally      F   139
4   2012  kevin      M   123
5   2012   mike      M    95
6   2012  sally      F   141
7   2013  kevin      M   111
8   2013   mike      M    89
9   2013  sally      F   113
10  2014  kevin      M   118
11  2014   mike      M    95
12  2014  sally      F   139