考虑两个关键列数不同的数据表:
library(data.table)
tmp_dt <- data.table(group1 = letters[1:5], group2 = c(1, 1, 2, 2, 2), a = rnorm(5), key = c("group1", "group2"))
tmp_dt2 <- data.table(group2 = c(1, 2, 3), color = c("r", "g", "b"), key = "group2")
我希望tmp_dt
加入tmp_dt2
到group2
,但以下内容失败:
tmp_dt[tmp_dt2]
> tmp_dt[tmp_dt2]
Error in bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch, :
x.'group1' is a character column being joined to i.'group2' which is type 'double'. Character columns must join to factor or character columns.
这很有意义,因为它尝试在第一个键变量上连接数据表。如何修复它以使行为与dplyr::inner_join
相同,而不会因重置tmp_dt
上的密钥而产生两倍的费用?
> inner_join(tmp_dt, tmp_dt2, by = "group2")
group1 group2 a color
1 a 1 0.2501413 r
2 b 1 0.6182433 r
3 c 2 -0.1726235 g
4 d 2 -2.2239003 g
5 e 2 -1.2636144 g
答案 0 :(得分:1)
使用lapply
tmp_dt[,color:=unlist(lapply(.BY, function(x) tmp_dt2[group2==x, color])), by=group2]
正如弗兰克在评论中指出的那样,使用on
tmp_dt[tmp_dt2, on="group2"]
tmp_dt2[tmp_dt, on="group2"]
使用on
的速度大约是使用lapply
的{{1}}的两倍。虽然第一个示例返回第.BY
答案 1 :(得分:0)
您应该使用此代码
tmp_dt2[tmp_dt, on = 'group2']