r data.table就地连接多列

时间:2018-10-30 18:52:45

标签: r data.table

data.table很棒。

我想进行就地连接,但要保留两个表中的所有列。 This question演示了如何对单个列进行操作。当我希望联接表中的所有列都在最终结果中并将它们全部放在一个内存位置时,如何将其概括化。

library(data.table)
dt1 <- data.table(col1 = c("a", "b", "c"), 
                  col2 = 1:3, 
                  col3 = c(TRUE, FALSE, FALSE))

setkey(dt1, col1)

set.seed(1)
dt2 <- data.table(col1 = sample(c("a", "b", "c"), size = 10, replace = TRUE), 
                  another_col = sample(1:10, size = 10, replace = TRUE), 
                  and_anouther = sample(c(TRUE, FALSE), size = 10, replace = TRUE))

setkey(dt2, col1)

# I want to stick the columns from dt1 onto dt2

# this works
dt3 <- dt2[dt1]
dt3
    col1 another_col and_anouther col2  col3
 1:    a           9        FALSE    1  TRUE
 2:    b           2        FALSE    2 FALSE
 3:    b           9        FALSE    2 FALSE
 4:    b           6        FALSE    2 FALSE
 5:    b           5         TRUE    2 FALSE
 6:    b           8        FALSE    2 FALSE
 7:    c           9         TRUE    3 FALSE
 8:    c           5        FALSE    3 FALSE
 9:    c           7        FALSE    3 FALSE
10:    c           6        FALSE    3 FALSE

# but i want to do this by reference

# this works for one column
dt2[dt1, col2 := i.col2]
dt2

    col1 another_col and_anouther col2
 1:    a           3        FALSE    1
 2:    a           8         TRUE    1
 3:    a           8         TRUE    1
 4:    b           2         TRUE    2
 5:    b           7        FALSE    2
 6:    b          10         TRUE    2
 7:    b           4        FALSE    2
 8:    c           4         TRUE    3
 9:    c           5         TRUE    3
10:    c           8         TRUE    3

# ok, remove that column
dt2[, col2 := NULL]

# now try to join multiple columns 
# this doesn't work
dt2[dt1, (col2 := i.col2, 
          col3 := i.col3)]

# neither does this
dt2[dt1, .(col2 := i.col2, 
          col3 := i.col3)]

# this just give me to the two columns
dt2[dt1, .(col2 = i.col2, 
           col3 = i.col3)]
dt2
   col2  col3
 1:    1  TRUE
 2:    1  TRUE
 3:    1  TRUE
 4:    2 FALSE
 5:    2 FALSE
 6:    2 FALSE
 7:    2 FALSE
 8:    3 FALSE
 9:    3 FALSE
10:    3 FALSE  

                ^

reprex package(v0.2.1)于2018-10-30创建

我想从dt3得到结果,但是我希望通过引用将其创建为dt2。谢谢!

2 个答案:

答案 0 :(得分:3)

我应该看过链接到此one more questionsawesome reference.。我需要做的就是使用:=运算符的函数形式。

dt2[dt1, `:=` (col2 = i.col2, 
          col3 = i.col3)]

dt2
    col1 another_col and_anouther col2  col3
 1:    a           3        FALSE    1  TRUE
 2:    a           8         TRUE    1  TRUE
 3:    a           8         TRUE    1  TRUE
 4:    b           2         TRUE    2 FALSE
 5:    b           7        FALSE    2 FALSE
 6:    b          10         TRUE    2 FALSE
 7:    b           4        FALSE    2 FALSE
 8:    c           4         TRUE    3 FALSE
 9:    c           5         TRUE    3 FALSE
10:    c           8         TRUE    3 FALSE

答案 1 :(得分:2)

函数语法比标准语法更干净。

dt2[dt1, c("col2", "col3") := .(col2, col3), on = c(col1 = "col1")][order(col1)]

    col1 another_col and_anouther col2  col3
 1:    a           3        FALSE    1  TRUE
 2:    a           8         TRUE    1  TRUE
 3:    a           8         TRUE    1  TRUE
 4:    b           2         TRUE    2 FALSE
 5:    b           7        FALSE    2 FALSE
 6:    b          10         TRUE    2 FALSE
 7:    b           4        FALSE    2 FALSE
 8:    c           4         TRUE    3 FALSE
 9:    c           5         TRUE    3 FALSE
10:    c           8         TRUE    3 FALSE