这个问题是前一个问题的后续问题:
left outer join with data.table with different names for key variables
我现在尝试使用" on"以Matt Dowle建议的方式论证。我很困惑,它不起作用。我想知道这目前是否只适用于版本1.9.7的data.table。
packageVersion('data.table')
# [1] ‘1.9.6’
DT1 = data.table(x1 = c("b", "c", "a", "b", "a", "b"),
x2a = as.character(1:6), m1 = seq(10, 60, by = 10))
DT1
# x1 x2a m1
# 1: b 1 10
# 2: c 2 20
# 3: a 3 30
# 4: b 4 40
# 5: a 5 50
# 6: b 6 60
DT2 = data.table(x1 = c("b", "d", "c", "b" ,"a", "a"),
x2b = c(1, 4, 7, 6, " ", " ") ,m2 = 5:10)
DT2
# x1 x2b m2
# 1: b 1 5
# 2: d 4 6
# 3: c 7 7
# 4: b 6 8
# 5: a 9
# 6: a 10
#### merge command works fine
rtL <- merge(DT1, DT2, by.x = c('x1', 'x2a'),
by.y = c('x1', 'x2b'), all.x = TRUE)
rtL
# x1 x2a m1 m2
# 1: a 3 30 NA
# 2: a 5 50 NA
# 3: b 1 10 5
# 4: b 4 40 NA
# 5: b 6 60 8
# 6: c 2 20 NA
#### Join with the X[Y] syntax with the 'on' argument
rtL2 <- DT2[DT1, on = c('x1', x2a = 'x2b')]
forderv(x, by = rightcols)
中的错误:by
值-2147483648超出范围[1,3]
这里有什么问题?这是否需要1.9.7版本?
##### Another attempt with the x1 variable in quotes
rtL3 <- DT2[DT1, on = c("x1", "x2a" = "x2b")]
forderv(x, by = rightcols)
中的错误:by
值-2147483648超出范围[1,3]
在我看来,rtL2
版本更正确。
如何解释错误消息?我在这里做错了什么呢?
答案 0 :(得分:4)
事实上,这是1.9.6的问题,此后已在开发版本中得到修复。
问题(除了下订单之外)是您不会在x1
中为DT2
的对方命名,请参阅GitHub:
- 加入
醇>on=
时,X[Y, on=c(A="A", b="c")]
现在可以指定为X[Y, on=c("A", b="c")]
,完全关闭#1375。
至于1.9.6,以下工作:
packageVersion('data.table')
# [1] ‘1.9.6’
DT1[DT2, on = c(x1 = "x1", x2a = "x2b")]
# x1 x2a m1 m2
# 1: b 1 10 5
# 2: d 4 NA 6
# 3: c 7 NA 7
# 4: b 6 60 8
# 5: a NA 9
# 6: a NA 10