我有一个约100万行的数据集,某种DT
DT <- data.table(a = c(3,2,1,7,6,5),
b = c("1","1","1","2","2","2"),
c = c("2","2","2","3","3","3"),
d = c(5,6,7,8,9,0))
对于仅选择最大值超过组(b,c)的行,我使用
DT[DT[, .I[which.max(a)], by = list(b,c)]$V1]
给出了
a b c d
1: 3 1 2 5
2: 7 2 3 8
它工作正常,但我的问题可能是它不是一个更快/最优的解决方案。 欢迎任何建议!
答案 0 :(得分:0)
以下是order
的另一个选项。我们按照&#39;&#39;,&#39; c&#39;列,order
基于&#39; a&#39;的行。值按递增顺序排列,并使用tail
DT[order(a), tail(.SD, 1) , .(b, c)]
或setorder
setorder(DT, a)[, tail(.SD, 1), .(b, c)]