我正在尝试按两个类别汇总数据。
数据
year person expense sex money
2011 kevin truck M 1
2011 mike truck M 62
2011 sally truck F 60
2012 kevin truck M 37
2012 mike truck M 53
2012 sally truck F 95
2013 kevin truck M 21
2013 mike truck M 13
2013 sally truck F 38
2014 kevin truck M 48
2014 mike truck M 4
2014 sally truck F 77
2011 kevin house M 7
2011 mike house M 94
2011 sally house F 79
2012 kevin house M 86
2012 mike house M 42
2012 sally house F 46
2013 kevin house M 90
2013 mike house M 76
2013 sally house F 75
2014 kevin house M 70
2014 mike house M 91
2014 sally house F 62
如果年份和人物列匹配,我想要汇总金额列。
所需的输出
year person sex money
2011 kevin M 8
2011 mike M 156
2011 sally F 139
2012 kevin M 123
2012 mike M 95
2012 sally F 141
2013 kevin M 111
2013 mike M 89
2013 sally F 113
2014 kevin M 118
2014 mike M 95
2014 sally F 139
我该怎么做?
我尝试了data_aggregated = aggregate(data$money, by = list(name = data$name), FUN = sum)
,但只是按名称汇总了我的数据。我不知道如何用名字和年份来汇总它。它还会创建一个只包含两个变量列的数据框:name
和x
。
我还尝试将名称+年连接到ID变量中,但这似乎太乏味了。
答案 0 :(得分:4)
您可以从下面的公式路线或列表方法中进行选择。如果您还想要性别列,则可以像其他列一样添加:
aggregate(money ~ person + year + sex, data, sum)
或者用你的方法:
aggregate(data$money, by = list(person=data$person, year=data$year, sex=data$sex), FUN=sum)
包方法是:
library(dplyr)
data %>% group_by(person, year, sex) %>% summarise(mon_sum=sum(money))
# Source: local data table [12 x 4]
# Groups: person, year
#
# person year sex mon_sum
# (fctr) (int) (fctr) (int)
# 1 kevin 2011 M 8
# 2 mike 2011 M 156
# 3 sally 2011 F 139
# 4 kevin 2012 M 123
# 5 mike 2012 M 95
# 6 sally 2012 F 141
# 7 kevin 2013 M 111
# 8 mike 2013 M 89
# 9 sally 2013 F 113
# 10 kevin 2014 M 118
# 11 mike 2014 M 95
# 12 sally 2014 F 139
data.table
。这种方法在性能和编程时间方面证明是最有效的,值得学习:
library(data.table)
setDT(data)[,sum(money), by=.(person,year, sex)]
答案 1 :(得分:2)
使用dplyr
:
library(dplyr)
df %>% group_by(year, person, sex) %>% summarise(money = sum(money))
返回
Source: local data frame [12 x 4]
Groups: year [?]
year person sex money
(int) (fctr) (fctr) (int)
1 2011 kevin M 8
2 2011 mike M 156
3 2011 sally F 139
4 2012 kevin M 123
5 2012 mike M 95
6 2012 sally F 141
7 2013 kevin M 111
8 2013 mike M 89
9 2013 sally F 113
10 2014 kevin M 118
11 2014 mike M 95
12 2014 sally F 139