我回过头使用r和data.table一段时间后我仍然遇到连接问题。我之前问this question得到了令人满意的解释,但我仍然没有得到逻辑。 我们来看一些例子:
library("data.table")
X <- data.table(chiave=c("a", "a", "a", "b", "b"),valore1=1:5)
Y <- data.table(chiave=c("a", "b", "c", "d"),valore2=1:4)
X
chiave valore1
1: a 1
2: a 2
3: a 3
4: b 4
5: b 5
Y
chiave valore2
1: a 1
2: b 2
3: c 3
4: d 4
当我加入时,我收到错误:
setkey(X,chiave)
X[Y]
# Error in vecseq(f__, len__, if (allow.cartesian || notjoin) NULL else as.integer(max(nrow(x), :
Join results in 7 rows; more than 5 = max(nrow(x),nrow(i)). Check for duplicate key values in i, each of which join to the same group in x over and over again. If that's ok, try including `j` and dropping `by` (by-without-by) so that j runs for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and datatable-help for advice.
这样:
X[Y,allow.cartesian=T]
chiave valore1 valore2
1: a 1 1
2: a 2 1
3: a 3 1
4: b 4 2
5: b 5 2
6: c NA 3
7: d NA 4
请注意,X
有重复的密钥而i
没有。如果我将Y
更改为:
Y <- data.table(chiave=c("b", "c", "d"),valore2=1:3)
Y
chiave valore2
1: b 1
2: c 2
3: d 3
加入完成时没有错误消息,也不需要allow.cartesian,但逻辑上情况相同:X
有多个密钥而i
没有。
X[Y]
chiave valore1 valore2
1: b 4 1
2: b 5 1
3: c NA 2
4: d NA 3
另一方面:
X <- data.table(chiave=c("a", "a", "a", "a", "a", "a", "b", "b"),valore1=1:8)
Y <- data.table(chiave=c("b", "b", "d"),valore2=1:3)
X
chiave valore1
1: a 1
2: a 2
3: a 3
4: a 4
5: a 5
6: a 6
7: b 7
8: b 8
Y
chiave valore2
1: b 1
2: b 2
3: d 3
我在X
和i
都有多个密钥,但是联接(和笛卡尔产品)已完成,没有错误消息,也不需要allow.cartesian
setkey(X,chiave)
X[Y]
chiave valore1 valore2
1: b 7 1
2: b 8 1
3: b 7 2
4: b 8 2
5: d NA 3
从我的角度来看,当且仅当我在X和i中都有多个键时才需要警告(不仅仅是因为结果表的行数多于max(nrow(x),nrow(i)
))并且仅在这种情况下我认为需要allow.cartesian
(所以不是我的前两个例子)。
答案 0 :(得分:2)
Just to keep this answered, this behaviour with allow.cartesian
has been fixed in the current development version v1.9.5
, and will be soon available on CRAN as v1.9.6
. Odd versions are devel, and even stable. From NEWS:
allow.cartesian
is ignored during joins when:
i
has no duplicates andmult="all"
. Closes #742. Thanks to @nigmastar for the report.- assigning by reference, i.e.,
j
has:=
. Closes #800. Thanks to @matthieugomez for the report.In both these cases (and during a
not-join
which was already fixed in 1.9.4),allow.cartesian
can be safely ignored.