我有一个数据框如下
structure(list(code = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), Month = structure(c(4L,
3L, 7L, 1L, 8L, 6L, 5L, 2L, 9L, 4L, 3L, 7L, 1L, 8L, 6L, 5L, 2L,
9L), .Label = c("Apr", "Aug", "Feb", "Jan", "Jul", "Jun", "Mar",
"May", "Sep"), class = "factor"), Var1 = c(10L, 20L, 30L, 40L,
50L, 60L, 70L, 80L, 90L, 10L, 20L, 30L, 40L, 50L, 60L, 70L, 80L,
90L), var2 = c(2, 3.5, 55, 23.5, 1, 1, 1, 1.5, 1, 1.5, 1, 1,
1, 1.5, 1.75, 1.75, 1, 1)), .Names = c("code", "Month",
"Var1", "var2"), class = "data.frame", row.names = c(NA, -18L
))
code Month Var1 var2
1 0 Jan 10 2.00
2 0 Feb 20 3.50
3 0 Mar 30 55.00
4 0 Apr 40 23.50
5 0 May 50 1.00
6 0 Jun 60 1.00
7 0 Jul 70 1.00
8 0 Aug 80 1.50
9 0 Sep 90 1.00
10 1 Jan 10 1.50
11 1 Feb 20 1.00
12 1 Mar 30 1.00
13 1 Apr 40 1.00
14 1 May 50 1.50
15 1 Jun 60 1.75
16 1 Jul 70 1.75
17 1 Aug 80 1.00
18 1 Sep 90 1.00
我想创建另一个数据框,其中 - 对于每个代码,我添加前3个月 - 1月2月3月,再添加3个月4月5月6月, 保持7月8月9月个别
我的预期数据框
code Month Var1 var2
1 0 <Jan+Feb+Mar> <10+20+30> <2.00 + 3.50 + 55.00>
4 0 <Apr+May+Jun> <40+50+60> <23.50 + 1.00 + 1.00>
7 0 Jul 70 1.00
8 0 Aug 80 1.50
9 0 Sep 90 1.00
... similarly for code 2
有没有有效的方法来实现这一目标?特别是使用data.table
答案 0 :(得分:3)
使用data.table
的解决方案:
library(data.table)
setDT(d)
d[, group := Month]
d[Month %in% c("Jan", "Feb", "Mar"), group := "Jan+Feb+Mar"]
d[Month %in% c("Apr", "May", "Jun"), group := "Apr+May+Jun"]
d[, .(Var1 = sum(Var1), var2 = sum(var2)), .(code, Month = group)]
结果:
code Month Var1 var2
1: 0 Jan+Feb+Mar 60 60.50
2: 0 Apr+May+Jun 150 25.50
3: 0 Jul 70 1.00
4: 0 Aug 80 1.50
5: 0 Sep 90 1.00
6: 1 Jan+Feb+Mar 60 3.50
7: 1 Apr+May+Jun 150 4.25
8: 1 Jul 70 1.75
9: 1 Aug 80 1.00
10: 1 Sep 90 1.00
答案 1 :(得分:3)
以下是使用%/%
setDT(df1)[, grp := c('Jan_Feb_Mar', 'Apr_May_Jun')[(match(Month, month.abb)-1) %/% 3 + 1]
][is.na(grp), grp := Month][, .(sum(var2)), .(code, grp)]