即使我没有重复项,也会出现allow.cartesian错误

时间:2014-10-09 15:24:49

标签: r data.table

我有两个data.table个像这样的对象:

个人资料表

> profile
                id cat_id
    1: -HcDR55tvHU      1
    2: -SZ3w0Vs_Ww      1
    3: -UGaEjF4yPo      1
    4: -iJG24SZJ20      1
    5: -veBNWFcvcI      1
   ---                   
45832: zyOY8uqweaA     29
45833: zyR1T15yl58     29
45834: zyu8OPWhJoA     29
45835: zzN9id8zUcs     29
45836: zzjpVRq8bXM     29

> key(profile)
[1] "cat_id"

类别表

> head(cat)
   id         category
1:  1 Film & Animation
2:  2 Autos & Vehicles
3: 10            Music
4: 15   Pets & Animals
5: 17           Sports
6: 18     Short Movies

> key(cat)
[1] "id"

我正在尝试加入这两个表,预期的输出应该是这样的:

                id cat_id         category
    1: -HcDR55tvHU      1 Film & Animation
    2: -SZ3w0Vs_Ww      1 Film & Animation
    3: -UGaEjF4yPo      1 Film & Animation
    4: -iJG24SZJ20      1 Film & Animation
    5: -veBNWFcvcI      1 Film & Animation
   ---                   
45832: zyOY8uqweaA     29      bla bla bla
45833: zyR1T15yl58     29      bla bla bla
45834: zyu8OPWhJoA     29      bla bla bla
45835: zzN9id8zUcs     29      bla bla bla
45836: zzjpVRq8bXM     29      bla bla bla

当我data <- profile[cat]时,我得到重复键的这个奇怪的错误:

Error in vecseq(f__, len__, if (allow.cartesian) NULL else as.integer(max(nrow(x),  : 
  Join results in 45853 rows; more than 45836 = max(nrow(x),nrow(i)).
Check for duplicate key values in i, each of which join to the same group in x over
and over again. If that's ok, try including `j` and dropping `by` (by-without-by) so
that j runs for each group to avoid the large allocation. If you are sure you wish to
proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error
message in the FAQ, Wiki, Stack Overflow and datatable-help for advice.

然后我尝试了一些非常简单的东西,我相信这是相同的:

> test <- data.table("x"=rep(1:5, each=10), "y"=rnorm(50))
> test2 <- data.table("id"=1:5, "name"=c("a", "b", "c", "d", "e"))
> setkey(test, x)
> setkey(test2, id)
> test[test2]
    x           y name
 1: 1  0.85078369    a
 2: 1 -0.56896642    a
 3: 1 -0.12108724    a
 4: 1  0.09204798    a
 5: 1 -1.48852315    a
 6: 1  0.18614002    a

为什么后者有效但前者无效?

0 个答案:

没有答案