检查其他data.table中data.table值的频率

时间:2014-10-19 17:26:09

标签: r data.table

 library(data.table)
 DT1 <- data.table(num = 1:6, group = c("A", "B", "B", "B", "A", "C"))
 DT2 <- data.table(group = c("A", "B", "C"))

每当popular中包含DT2至少两次时,我想将TRUE列添加到DT2$group,其值为DT1$group。因此,在上面的示例中,DT2应为

    group popular
 1:     A    TRUE
 2:     B    TRUE
 3:     C   FALSE

什么是有效的方法?

更新的示例: DT2实际上可能包含的组数多于DT1,因此这里有一个更新的示例:

 DT1 <- data.table(num = 1:6, group = c("A", "B", "B", "B", "A", "C"))
 DT2 <- data.table(group = c("A", "B", "C", "D"))

所需的输出将是

    group popular
 1:     A    TRUE
 2:     B    TRUE
 3:     C   FALSE
 4:     D   FALSE

2 个答案:

答案 0 :(得分:10)

我只是这样做:

## 1.9.4+
setkey(DT1, group)
DT1[J(DT2$group), list(popular = .N >= 2L), by = .EACHI]
#    group popular
# 1:     A    TRUE
# 2:     B    TRUE
# 3:     C   FALSE
# 4:     D   FALSE ## on the updated example

data.table的连接语法非常强大,因为在加入时,您还可以在j中聚合/选择/更新列。在这里我们执行连接。对于DT2$group中的每一行,DT1中相应的匹配行,我们计算j - 表达式.N >= 2L;通过指定by = .EACHI(请检查1.9.4 NEWS),我们每次都会计算j - 表达式。


1.9.4中,.()已在所有ijby中作为别名引入。所以你也可以这样做:

DT1[.(DT2$group), .(popular = .N >= 2L), by = .EACHI]

当您按单个字符列加入时,可以完全删除.() / J()语法(为方便起见)。所以这也可以写成:

DT1[DT2$group, .(popular = .N >= 2L), by = .EACHI]

答案 1 :(得分:3)

我会这样做:首先计算每个组在DT1中显示的次数,然后只需加入DT2DT1

require(data.table)
DT1 <- data.table(num = 1:6, group = c("A", "B", "B", "B", "A", "C"))
DT2 <- data.table(group = c("A", "B", "C"))

#solution:
DT1[,num_counts:=.N,by=group] #the number of entries in this group, just count the other column
setkey(DT1, group)
setkey(DT2, group)
DT2 = DT1[DT2,mult="last"][,list(group, popular = (num_counts >= 2))]

#> DT2
#   group popular
#1:     A    TRUE
#2:     B    TRUE
#3:     C   FALSE