dplyr 0.3不能inner_join data.table?

时间:2014-09-27 09:07:54

标签: r data.table dplyr

我有以下设置,并加载了dplyr(0.3)和data.table(1.9.3)。

R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.9.3 dplyr_0.3       

loaded via a namespace (and not attached):
[1] assertthat_0.1 DBI_0.3.1      magrittr_1.0.1 parallel_3.1.1 plyr_1.8.1     Rcpp_0.11.2   
[7] reshape2_1.4   stringr_0.6.2  tools_3.1.1 

这是数据集。 2 data.tables和2 data.frames。这两套内容相同。

DT_1 = data.table(x = rep(c("a","b","c"), each = 3), y = c(1,3,6), v = 1:9)
DT_2 = data.table(V1 = c("b","c"),foo = c(4,2))

DT_1_df = data.frame(x = rep(c("a","b","c"), each = 3), y = c(1,3,6), v = 1:9)
DT_2_df = data.frame(V1 = c("b","c"),foo = c(4,2))

data.table way

当使用data.table方式对两个数据表进行内连接时,我们得到以下结果:

setkey(DT_1, x); setkey(DT_2, V1)
DT_1[DT_2]
  x y v foo
1: b 1 4   4
2: b 3 5   4
3: b 6 6   4
4: c 1 7   2
5: c 3 8   2
6: c 6 9   2
数据表中的

dplyr0.3 inner_join

在两个数据表上使用dplyr的inner_join时出错:

inner_join(DT_1, DT_2, by=("x"="V1"))
Error in setkeyv(x, by$x) : some columns are not in the data.table: V1
在data.frame&上的

dplyr0.3 inner_join data.table

如果使用数据框处理数据表,则会出现另一个错误:

inner_join(DT_1, DT_2_df, by = c("x" = "V1"))
Error: Data table joins must be on same key

dplyr0.3 data_frames

上的inner_join 然而,

inner_join在数据帧上运行得非常好:

inner_join(DT_1_df, DT_2_df, by = c("x" = "V1"))
  x y v foo
1 b 1 4   4
2 b 3 5   4
3 b 6 6   4
4 c 1 7   2
5 c 3 8   2
6 c 6 9   2

有人可以解释为什么会这样吗?

1 个答案:

答案 0 :(得分:1)

为完整起见,请在此处发布研究结果。

检查https://github.com/hadley/dplyr后,似乎有点"加入"目前功能有限。引用:"当前连接变量在左侧和右侧都必须相同。"下面的测试似乎证实了这一点:

library(dplyr); library(data.table)
DT_1 = data.table(x=rep(c("a","b","c"),each=3), y=c(1,3,6), v=1:9)
DT_2 = data.table(V1=c("b","c"),foo=c(4,2)) # note the variable name assigned to first column
DT_2b = data.table(x=c("b","c"),foo=c(4,2)) # note the variable name assigned to first column

inner_join(DT_1, DT_2b, by= "x")
Source: local data table [6 x 4]
  x y v foo
1 b 1 4   4
2 b 3 5   4
3 b 6 6   4
4 c 1 7   2
5 c 3 8   2
6 c 6 9   2

inner_join(DT_1, DT_2, by = c("x" = "V1"))
Error: Data table joins must be on same key