在data.table中汇总选择行

时间:2017-10-17 11:05:15

标签: r data.table

我有一个数据框如下

structure(list(code = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), Month = structure(c(4L, 
3L, 7L, 1L, 8L, 6L, 5L, 2L, 9L, 4L, 3L, 7L, 1L, 8L, 6L, 5L, 2L, 
9L), .Label = c("Apr", "Aug", "Feb", "Jan", "Jul", "Jun", "Mar", 
"May", "Sep"), class = "factor"), Var1 = c(10L, 20L, 30L, 40L, 
50L, 60L, 70L, 80L, 90L, 10L, 20L, 30L, 40L, 50L, 60L, 70L, 80L, 
90L), var2 = c(2, 3.5, 55, 23.5, 1, 1, 1, 1.5, 1, 1.5, 1, 1, 
1, 1.5, 1.75, 1.75, 1, 1)), .Names = c("code", "Month", 
"Var1", "var2"), class = "data.frame", row.names = c(NA, -18L
))

            code Month Var1  var2
1              0   Jan   10  2.00
2              0   Feb   20  3.50
3              0   Mar   30 55.00
4              0   Apr   40 23.50
5              0   May   50  1.00
6              0   Jun   60  1.00
7              0   Jul   70  1.00
8              0   Aug   80  1.50
9              0   Sep   90  1.00
10             1   Jan   10  1.50
11             1   Feb   20  1.00
12             1   Mar   30  1.00
13             1   Apr   40  1.00
14             1   May   50  1.50
15             1   Jun   60  1.75
16             1   Jul   70  1.75
17             1   Aug   80  1.00
18             1   Sep   90  1.00

我想创建另一个数据框,其中 - 对于每个代码,我添加前3个月 - 1月2月3月,再添加3个月4月5月6月, 保持7月8月9月个别

我的预期数据框

            code          Month         Var1  var2
1              0   <Jan+Feb+Mar>   <10+20+30>  <2.00 + 3.50 + 55.00>
4              0   <Apr+May+Jun>   <40+50+60>  <23.50 + 1.00 + 1.00>
7              0   Jul   70  1.00
8              0   Aug   80  1.50
9              0   Sep   90  1.00
... similarly for code 2

有没有有效的方法来实现这一目标?特别是使用data.table

2 个答案:

答案 0 :(得分:3)

使用data.table的解决方案:

library(data.table)
setDT(d)
d[, group := Month]
d[Month %in% c("Jan", "Feb", "Mar"), group := "Jan+Feb+Mar"]
d[Month %in% c("Apr", "May", "Jun"), group := "Apr+May+Jun"]
d[, .(Var1 = sum(Var1), var2 = sum(var2)), .(code, Month = group)]

结果:

    code       Month Var1  var2
 1:    0 Jan+Feb+Mar   60 60.50
 2:    0 Apr+May+Jun  150 25.50
 3:    0         Jul   70  1.00
 4:    0         Aug   80  1.50
 5:    0         Sep   90  1.00
 6:    1 Jan+Feb+Mar   60  3.50
 7:    1 Apr+May+Jun  150  4.25
 8:    1         Jul   70  1.75
 9:    1         Aug   80  1.00
10:    1         Sep   90  1.00

答案 1 :(得分:3)

以下是使用%/%

创建群组的选项
setDT(df1)[,  grp := c('Jan_Feb_Mar', 'Apr_May_Jun')[(match(Month, month.abb)-1) %/% 3 + 1]
            ][is.na(grp), grp := Month][, .(sum(var2)), .(code, grp)]