我有一个data.table
currentTime
按set.seed(1)
DT <- data.table(tag = rep(LETTERS[1:4],each = 2, times = 3),
year = rep(1:3, each = 8), month = rep(c(1:2), 12),
value = runif(24, 1, 10))
DT
tag year month value
1: A 1 1 3.389578
2: A 1 2 4.349115
3: B 1 1 6.155680
4: B 1 2 9.173870
5: C 1 1 2.815137
6: C 1 2 9.085507
7: D 1 1 9.502077
8: D 1 2 6.947180
9: A 2 1 6.662026
10: A 2 2 1.556076
11: B 2 1 2.853771
12: B 2 2 2.589011
13: C 2 1 7.183206
14: C 2 2 4.456933
15: D 2 1 7.928573
16: D 2 2 5.479293
17: A 3 1 7.458567
18: A 3 2 9.927155
19: B 3 1 4.420317
20: B 3 2 7.997007
21: C 3 1 9.412347
22: C 3 2 2.909283
23: D 3 1 6.865064
24: D 3 2 2.129996
,DT
和year
对此month
进行排序:
-value
将产生:
setorder(DT, year, month, -value)
我希望结果如下:
tag year month value
1: D 1 1 9.502077
2: B 1 1 6.155680
3: A 1 1 3.389578
4: C 1 1 2.815137
5: B 1 2 9.173870
6: C 1 2 9.085507
7: D 1 2 6.947180
8: A 1 2 4.349115
9: D 2 1 7.928573
10: C 2 1 7.183206
11: A 2 1 6.662026
12: B 2 1 2.853771
13: D 2 2 5.479293
14: C 2 2 4.456933
15: B 2 2 2.589011
16: A 2 2 1.556076
17: C 3 1 9.412347
18: A 3 1 7.458567
19: D 3 1 6.865064
20: B 3 1 4.420317
21: A 3 2 9.927155
22: B 3 2 7.997007
23: C 3 2 2.909283
24: D 3 2 2.129996
结果 tag year month value
1: D 1 1 9.502077
2: B 1 1 6.155680
3: B 1 2 9.173870
4: D 1 2 6.947180
5: D 2 1 7.928573
6: C 2 1 7.183206
7: D 2 2 5.479293
8: C 2 2 4.456933
9: C 3 1 9.412347
10: A 3 1 7.458567
11: A 3 2 9.927155
12: C 3 2 2.909283
将具有以下属性:在每年内只保留DT
中值较大的两个tags
。例如,month 1
,值较大的两个year 1
为tags
和D
,因此保留B
和D
为{ {1}}。 B
保留year 1
和year 2
。在每个D
,我需要重新选择年份的行。
答案 0 :(得分:3)
我们按'年'分组,获取'月'1的'标签'中的前两个元素(因为它已经被订购),使用%in%
创建逻辑索引并对行进行子集化。
DT[, .SD[tag %in% head(tag[month ==1],2)], .(year)]
# year tag month value
# 1: 1 D 1 9.502077
# 2: 1 B 1 6.155680
# 3: 1 B 2 9.173870
# 4: 1 D 2 6.947180
# 5: 2 D 1 7.928573
# 6: 2 C 1 7.183206
# 7: 2 D 2 5.479293
# 8: 2 C 2 4.456933
# 9: 3 C 1 9.412347
#10: 3 A 1 7.458567
#11: 3 A 2 9.927155
#12: 3 C 2 2.909283