Question

我有一个非唯一键的数据表：

> dput(sv)
structure(list(kwd = c("a", "a", "b", "b", "c"), pixel = c(1,
2, 1, 2, 2), kpN = c(2L, 2L, 2L, 1L, 1L)), row.names = c(NA,
-5L), class = c("data.table", "data.frame"), .Names = c("kwd",
"pixel", "kpN"), .internal.selfref = <pointer: 0x7fc4aa800778>, sorted = "kwd")
> dput(kwd)
structure(list(kwd = c("a", "b", "c", "z"), kwdN = c(3L, 2L,
1L, 1L)), row.names = c(NA, -4L), class = c("data.table", "data.frame"
), .Names = c("kwd", "kwdN"), .internal.selfref = <pointer: 0x7fc4aa800778>, sorted = "kwd")

为什么我会收到此错误：

> sv[kwd,kwdN:=kwdN]
Starting bmerge ...done in 0 secs
Error in vecseq(f__, len__, if (allow.cartesian || notjoin) NULL else as.integer(max(nrow(x),  :
  Join results in 6 rows; more than 5 = max(nrow(x),nrow(i)). Check for duplicate key values in i, each of which join to the same group in x over and over again. If that's ok, try including `j` and dropping `by` (by-without-by) so that j runs for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and datatable-help for advice.
Calls: [ -> [.data.table -> vecseq

我期待这样的事情（注意关键是：

   kwd pixel kpN kwdN
1:   a     1   2    3
2:   a     2   2    3
3:   b     1   2    2
4:   b     2   1    2
5:   c     2   1    1

此外，我很确定它之前的效果如此。

这是data.table 1.9.4中发生了哪些变化？

我如何得到我想要的东西？（kwd[sv]似乎有效，是新的方式吗？）

Answer 1

就这样仍然回答：

在@Roland发布this之后实施了

allow.cartesian功能。有关其他说明，请参阅this帖子。

不需要allow.cartesian（因此不应该出错）的情况是：

当i没有重复项#742时 - 之前未正确检查。已在1.9.5（当前开发版本）中修复。
当j有:= #800时，行数不会超过x。已在1.9.5（当前开发版本）中修复。
当操作是 not-join （或反连接）时，#698 - 行数永远不会超过{{ 1}}再一次。已修复于1.9.4。

总之，x错误仅在必要时发生。在CRAN上发布1.9.6时，可以使用1.9.5中的修复程序（现在应该很快）。

加入i中独一无二的非唯一键

1 个答案: