与data.tables链接多个data.table :: merge操作

时间:2018-09-20 01:09:52

标签: r merge data.table

是否可以使用data.tables依次链接多个合并操作?

该功能类似于在ata.frames管道中连接多个d dplyr,但将其用于data.tables的方式类似于在两个data.tables中合并两个data.table下面的内容,然后根据需要操作data.table。但是只有您才能合并另一个library(dplyr) library(data.table) # data.frame df1 = data.frame(food = c("apples", "bananas", "carrots", "dates"), quantity = c(1:4)) df2 = data.frame(food = c("apples", "bananas", "carrots", "dates"), status = c("good", "bad", "rotten", "raw")) df3 = data.frame(food = c("apples", "bananas", "carrots", "dates"), rank = c("okay", "good", "better", "best")) df4 = left_join(df1, df2, by = "food") %>% mutate(new_col = NA) %>% # this is just to hold a position of mutation in the data.frame left_join(., df3, by = "food") # data.table dt1 = data.table(food = c("apples", "bananas", "carrots", "dates"), quantity = c(1:4)) dt2 = data.table(food = c("apples", "bananas", "carrots", "dates"), status = c("good", "bad", "rotten", "raw")) dt3 = data.table(food = c("apples", "bananas", "carrots", "dates"), rank = c("okay", "good", "better", "best")) # this is what I am not sure how to implement dt4 = merge(dt1, dt2, by = "food")[ food == "apples"](merge(dt4)) 。我承认this SO question here可能非常相似,也就是@ chinsoon12发表评论之后。

感谢您的帮助!

{{1}}

1 个答案:

答案 0 :(得分:3)

可以链接带有on参数的多个data.table连接。请注意,如果j中没有更新运算符(“:=”),这将是一个右连接,但是如果使用“:=”(即添加列),则将成为左外部连接。有用的左侧帖子在这里Left join using data.table上加入。

使用上面的示例数据并在联接之间设置一个子集的示例:

dt4 <- dt1[dt2, on="food", `:=`(status = i.status)][
            food == "apples"][dt3, on="food", rank := i.rank]

##> dt4
## food quantity status rank
##1: apples        1   good okay

在联接之间添加新列的示例

dt4 <- dt1[dt2, on="food", `:=`(status = i.status)][
            , new_col := NA][dt3, on="food", rank := i.rank]

##> dt4
##      food quantity status new_col   rank
##1:  apples        1   good      NA   okay
##2: bananas        2    bad      NA   good
##3: carrots        3 rotten      NA better
##4:   dates        4    raw      NA   best

使用merge和magrittr管道的示例:

dt4 <-  merge(dt1, dt2, by = "food") %>%
           set( , "new_col", NA) %>% 
             merge(dt3, by = "food")

##> dt4
##      food quantity status new_col   rank
##1:  apples        1   good      NA   okay
##2: bananas        2    bad      NA   good
##3: carrots        3 rotten      NA better
##4:   dates        4    raw      NA   best