Question

我想使用另一个data.table中的元组（多列）列表来对data.table进行子集设置，但不确定如何实现。

通过单列子设置

DT1[col1 %in% DT2(col_1)]

我尝试过的是

DT1[c(col1, col2) %in% DT2(col_1, col_2)]

尽管不成功。错误是

i evaluates to a logical vector length 91369852 but there are 45684926
rows. Recycling of logical i is no longer allowed as it hides more
bugs than is worth the rare convenience. Explicitly use
rep(...,length=.N) if you really need to recycle.

有什么想法吗？如果%in%不是正确的方法，您将如何解决此问题？

Answer 1

您正在做的是每行2个布尔值，因此您会遇到此错误，而不会执行自己的操作。因此，确实%in%并非做到这一点。

您应该使用and将其设置为双重条件：

我举一个可重复的例子：

DT1 = as.data.table(data.frame(col1 = c(1,2,3,2,5,1,3,3,1,2), 
                               col2 = c(3,4,5,4,3,4,5,3,4,5), 
                               col3 = c(1,2,3,4,5,6,7,8,9,10))) 

DT2 = as.data.table(data.frame(col1 = c(1,2,1,2,3,4,3,2,4,3), 
                               col2 = c(3,4,5,3,6,4,5,4,3,4), 
                               col3=c(11,12,13,14,15,16,17,18,19,20)))

编辑：根据评论，我纠正了我的答案（这比我想的要技巧得多。

我创建了一个过滤器功能，可以帮助我检查DT2中是否有匹配项

filter <- function(x){
  any(x[1] == DT2[["col1"]] & x[2] == DT2[["col2"]])
}

我将此功能应用于DT1的每一行

indexes = apply(DT1, 1, filter)

我过滤

> DT1[indexes, ]
   col1 col2 col3
1:    1    3    1
2:    2    4    2
3:    3    5    3
4:    2    4    4
5:    3    5    7

具有％in％子句的元组列表（多列）中的子集

1 个答案: