加入i中独一无二的非唯一键

时间:2014-10-29 15:58:25

标签: r data.table

我有一个非唯一键的数据表:

> dput(sv)
structure(list(kwd = c("a", "a", "b", "b", "c"), pixel = c(1,
2, 1, 2, 2), kpN = c(2L, 2L, 2L, 1L, 1L)), row.names = c(NA,
-5L), class = c("data.table", "data.frame"), .Names = c("kwd",
"pixel", "kpN"), .internal.selfref = <pointer: 0x7fc4aa800778>, sorted = "kwd")
> dput(kwd)
structure(list(kwd = c("a", "b", "c", "z"), kwdN = c(3L, 2L,
1L, 1L)), row.names = c(NA, -4L), class = c("data.table", "data.frame"
), .Names = c("kwd", "kwdN"), .internal.selfref = <pointer: 0x7fc4aa800778>, sorted = "kwd")

为什么我会收到此错误:

> sv[kwd,kwdN:=kwdN]
Starting bmerge ...done in 0 secs
Error in vecseq(f__, len__, if (allow.cartesian || notjoin) NULL else as.integer(max(nrow(x),  :
  Join results in 6 rows; more than 5 = max(nrow(x),nrow(i)). Check for duplicate key values in i, each of which join to the same group in x over and over again. If that's ok, try including `j` and dropping `by` (by-without-by) so that j runs for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and datatable-help for advice.
Calls: [ -> [.data.table -> vecseq

我期待这样的事情(注意关键是:

   kwd pixel kpN kwdN
1:   a     1   2    3
2:   a     2   2    3
3:   b     1   2    2
4:   b     2   1    2
5:   c     2   1    1

此外,我很确定它之前的效果如此。

这是data.table 1.9.4中发生了哪些变化?

我如何得到我想要的东西? (kwd[sv]似乎有效,是新的方式吗?)

1 个答案:

答案 0 :(得分:1)

就这样仍然回答:

在@Roland发布this之后实施了

allow.cartesian功能。有关其他说明,请参阅this帖子。

不需要allow.cartesian(因此不应该出错)的情况是:

  • i没有重复项#742时 - 之前未正确检查。已在1.9.5(当前开发版本)中修复。

  • j:= #800时,行数不会超过x。已在1.9.5(当前开发版本)中修复。

  • 当操作是 not-join (或反连接)时,#698 - 行数永远不会超过{{ 1}}再一次。已修复于1.9.4。

总之,x错误仅在必要时发生。在CRAN上发布1.9.6时,可以使用1.9.5中的修复程序(现在应该很快)。