加入两个data.tables失败

时间:2014-03-27 21:28:41

标签: r join data.table lookup

我正在尝试将数据表用作查找表:

> (dt <- data.table(myid=rep(11:12,3),zz=1:6,key=c("myid","zz")))
   myid zz
1:   11  1
2:   11  3
3:   11  5
4:   12  2
5:   12  4
6:   12  6
> (id2name <- data.table(id=11:14,name=letters[1:4],key="id"))
   id name
1: 11    a
2: 12    b
3: 13    c
4: 14    d

我想要的是

> (res <- data.table(myid=rep(11:12,3),zz=1:6,name=rep(letters[1:2],3),key=c("myid","zz")))
   myid zz name
1:   11  1    a
2:   11  3    a
3:   11  5    a
4:   12  2    b
5:   12  4    b
6:   12  6    b

但是,我试过的联接失败了:

> dt[id2name]
Starting binary search ...done in 0 secs
Error in vecseq(f__, len__, if (allow.cartesian) NULL else as.integer(max(nrow(x),  : 
  Join results in 8 rows; more than 6 = max(nrow(x),nrow(i)). Check for duplicate key values in i, each of which join to the same group in x over and over again. If that's ok, try including `j` and dropping `by` (by-without-by) so that j runs for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and datatable-help for advice.
Calls: [ -> [.data.table -> vecseq

我做错了什么?

PS。我可以通过任何其他方式获得结果;什么是最惯用的方式来做我想要的事情(dt必须仍然是data.table,但id2name可以是将int映射到其他东西的任何东西 - 只要不假设int成为矢量索引。)

1 个答案:

答案 0 :(得分:5)

> dt[id2name, allow.cartesian=T, nomatch=0]
   myid zz name
1:   11  1    a
2:   11  3    a
3:   11  5    a
4:   12  2    b
5:   12  4    b
6:   12  6    b

data.table正试图将您从自己身上拯救出来,以防您无意中加入具有重复值的键。请注意,如果您确定知道自己在做什么,错误消息(最终)会告诉您该怎么做。

可替换地:

> id2name[dt]
   id name zz
1: 11    a  1
2: 11    a  3
3: 11    a  5
4: 12    b  2
5: 12    b  4
6: 12    b  6