我有两个data.table
个像这样的对象:
个人资料表
> profile
id cat_id
1: -HcDR55tvHU 1
2: -SZ3w0Vs_Ww 1
3: -UGaEjF4yPo 1
4: -iJG24SZJ20 1
5: -veBNWFcvcI 1
---
45832: zyOY8uqweaA 29
45833: zyR1T15yl58 29
45834: zyu8OPWhJoA 29
45835: zzN9id8zUcs 29
45836: zzjpVRq8bXM 29
> key(profile)
[1] "cat_id"
类别表
> head(cat)
id category
1: 1 Film & Animation
2: 2 Autos & Vehicles
3: 10 Music
4: 15 Pets & Animals
5: 17 Sports
6: 18 Short Movies
> key(cat)
[1] "id"
我正在尝试加入这两个表,预期的输出应该是这样的:
id cat_id category
1: -HcDR55tvHU 1 Film & Animation
2: -SZ3w0Vs_Ww 1 Film & Animation
3: -UGaEjF4yPo 1 Film & Animation
4: -iJG24SZJ20 1 Film & Animation
5: -veBNWFcvcI 1 Film & Animation
---
45832: zyOY8uqweaA 29 bla bla bla
45833: zyR1T15yl58 29 bla bla bla
45834: zyu8OPWhJoA 29 bla bla bla
45835: zzN9id8zUcs 29 bla bla bla
45836: zzjpVRq8bXM 29 bla bla bla
当我data <- profile[cat]
时,我得到重复键的这个奇怪的错误:
Error in vecseq(f__, len__, if (allow.cartesian) NULL else as.integer(max(nrow(x), :
Join results in 45853 rows; more than 45836 = max(nrow(x),nrow(i)).
Check for duplicate key values in i, each of which join to the same group in x over
and over again. If that's ok, try including `j` and dropping `by` (by-without-by) so
that j runs for each group to avoid the large allocation. If you are sure you wish to
proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error
message in the FAQ, Wiki, Stack Overflow and datatable-help for advice.
然后我尝试了一些非常简单的东西,我相信这是相同的:
> test <- data.table("x"=rep(1:5, each=10), "y"=rnorm(50))
> test2 <- data.table("id"=1:5, "name"=c("a", "b", "c", "d", "e"))
> setkey(test, x)
> setkey(test2, id)
> test[test2]
x y name
1: 1 0.85078369 a
2: 1 -0.56896642 a
3: 1 -0.12108724 a
4: 1 0.09204798 a
5: 1 -1.48852315 a
6: 1 0.18614002 a
为什么后者有效但前者无效?