我正在寻找一种实用的方法来(最好使用)0
为每个组检索最接近data.table
的值。
假设以下DT
:
set.seed(1)
library(data.table)
DT <- data.table(val = rnorm(1000), group = rep(1:10, each = 10)) # 10 groups
我尝试将by = group
和roll = "nearest"
组合在一起,但是它只返回最近的值 cross ,而不返回 by 组:
DT[val == 0, val, by = group, roll = "nearest"]
# group value
#1: 8 0.001105352
我当然可以为每个小组重复该过程,但是随着小组数目的增加,这是不切实际的。例如:
res <- rbind(DT[val == 0 & group = 1, val, by = group, roll = "nearest"],
DT[val == 0 & group = 2, val, by = group, roll = "nearest"],
DT[val == 0 & group = 3, val, by = group, roll = "nearest"],
...)
也许我缺少一些data.table
功能?
答案 0 :(得分:3)
您不一定需要加入。
结合使用min
和abs
的可能解决方案:
DT[, .(closest.val.to.zero = val[abs(val) == min(abs(val))]), by = group]
给出:
group closest.val.to.zero 1: 1 0.011292688 2: 2 -0.016190263 3: 3 0.002131860 4: 4 0.004398704 5: 5 0.017395620 6: 6 0.002415809 7: 7 0.004884450 8: 8 0.001105352 9: 9 -0.040150452 10: 10 -0.010925691
该选项的一种更通用的方式为posted by @chinsoon12 in the comments:
DT[CJ(group = group, val = 0, unique = TRUE)
, on = .(group, val)
, .(group, closest.val.to.zero = x.val)
, roll = "nearest"]