如何在data.table 1.9.7中使用变量列名进行非equi连接?例如,
已知列名:
library(data.table)
dt <- data.table(x=round(rnorm(10)), y=rnorm(10))
binDT <- data.table(LB=c(-Inf, -1, 0, .2, .7, 1.5, 3), RB=c(-1, 0, .2, .7, 1.5, 3, Inf))
dt[binDT, on=.(x>=LB, x<RB)]
x y x.1
1: -Inf 2.2669 -1.0
2: -1.0 -0.5453 0.0
3: -1.0 0.5125 0.0
4: 0.0 1.4151 0.2
5: 0.0 -0.1440 0.2
6: 0.0 -1.1802 0.2
7: 0.0 0.3338 0.2
8: 0.0 -1.8220 0.2
9: 0.2 NA 0.7
10: 0.7 0.3155 1.5
11: 0.7 -0.6284 1.5
12: 1.5 NA 3.0
13: 3.0 NA Inf
变量列名:
colName <- "x"
dt[binDT, on=.(get(colName)>=LB, get(colName)<RB)] # Error
dt[binDT, on=eval(parse(text="list(x>=LB, x<RB)"))] # Error
答案 0 :(得分:7)
@Shape答案很好,但有更简单的方法来实现它。 on
参数可以采用字符向量,因此可以将粘贴预期的列和运算符。
colName="x"
on=sprintf(c("%s>=LB","%s<RB"), colName)
print(on)
#[1] "x>=LB" "x<RB"
dt[binDT, on=on]
# x y x.1
# 1: -Inf NA -1.0
# 2: -1.0 0.48127355 0.0
# 3: 0.0 0.11779604 0.2
# 4: 0.0 -0.97891522 0.2
# 5: 0.0 -0.05969859 0.2
# 6: 0.0 -0.05625401 0.2
# 7: 0.2 NA 0.7
# 8: 0.7 -0.84438216 1.5
# 9: 0.7 0.80151913 1.5
#10: 1.5 -0.11013456 3.0
#11: 1.5 0.82139242 3.0
#12: 3.0 -1.24386831 Inf
答案 1 :(得分:4)
使用替换为虚拟变量名称并将其提供给命名列表:
res <- substitute(dt[binDT,on=.(A>=LB,B<RB)],
list(A = as.name(colName),
B = as.name(colName)))
# the values get replaced in the call
> res
dt[binDT, on = .(x >= LB, x < RB)]
eval(res)
x y x.1
1: -Inf NA -1.0
2: -1.0 0.69668714 0.0
3: -1.0 -0.03824623 0.0
4: 0.0 0.91269554 0.2
5: 0.0 0.42322463 0.2
6: 0.0 -0.22891670 0.2
7: 0.0 0.61413004 0.2
8: 0.2 NA 0.7
9: 0.7 -1.47526635 1.5
10: 0.7 -1.12899562 1.5
11: 0.7 1.05462948 1.5
12: 1.5 -0.04467894 3.0
13: 3.0 NA Inf