X [Y]连接语法与' on'争论

时间:2016-01-08 21:09:24

标签: r data.table

这个问题是前一个问题的后续问题:

left outer join with data.table with different names for key variables

我现在尝试使用" on"以Matt Dowle建议的方式论证。我很困惑,它不起作用。我想知道这目前是否只适用于版本1.9.7的data.table。

packageVersion('data.table')
# [1] ‘1.9.6’

DT1 = data.table(x1 = c("b", "c", "a", "b", "a", "b"), 
                 x2a = as.character(1:6), m1 = seq(10, 60, by = 10))
DT1
#    x1 x2a m1
# 1:  b   1 10
# 2:  c   2 20
# 3:  a   3 30
# 4:  b   4 40
# 5:  a   5 50
# 6:  b   6 60

DT2 = data.table(x1 = c("b", "d", "c", "b" ,"a", "a"),
                 x2b = c(1, 4, 7, 6, " ", " ") ,m2 = 5:10)
DT2
#    x1 x2b m2
# 1:  b   1  5
# 2:  d   4  6
# 3:  c   7  7
# 4:  b   6  8
# 5:  a      9
# 6:  a     10

#### merge command works fine
rtL <- merge(DT1, DT2, by.x = c('x1', 'x2a'),
             by.y = c('x1', 'x2b'), all.x = TRUE)
rtL
#    x1 x2a m1 m2
# 1:  a   3 30 NA
# 2:  a   5 50 NA
# 3:  b   1 10  5
# 4:  b   4 40 NA
# 5:  b   6 60  8
# 6:  c   2 20 NA   

#### Join with the X[Y] syntax with the 'on' argument
rtL2 <- DT2[DT1, on = c('x1', x2a = 'x2b')]
  

forderv(x, by = rightcols)中的错误:         by值-2147483648超出范围[1,3]

这里有什么问题?这是否需要1.9.7版本?

##### Another attempt with the x1 variable in quotes
rtL3 <- DT2[DT1, on = c("x1", "x2a" = "x2b")] 
  

forderv(x, by = rightcols)中的错误:         by值-2147483648超出范围[1,3]

在我看来,rtL2版本更正确。

如何解释错误消息?我在这里做错了什么呢?

1 个答案:

答案 0 :(得分:4)

事实上,这是1.9.6的问题,此后已在开发版本中得到修复。

问题(除了下订单之外)是您不会在x1中为DT2的对方命名,请参阅GitHub

  
      
  1. 加入on=时,X[Y, on=c(A="A", b="c")]现在可以指定为X[Y, on=c("A", b="c")],完全关闭#1375
  2.   

至于1.9.6,以下工作:

packageVersion('data.table')
# [1] ‘1.9.6’
DT1[DT2, on = c(x1 = "x1", x2a = "x2b")]
#    x1 x2a m1 m2
# 1:  b   1 10  5
# 2:  d   4 NA  6
# 3:  c   7 NA  7
# 4:  b   6 60  8
# 5:  a     NA  9
# 6:  a     NA 10