根据其他表中的记录过滤一个表中的记录

时间:2017-06-10 13:56:29

标签: r dataframe data.table

我有两个data.table dt和dt1,它们看起来像:

> dt <- data.table(grp = c("A", "A",  "B", "B", "C"),
                   cat = c("01", "02", "01", "02", "01"),
                  Value = c(234, 234, 235, 536, 235))

> dt
   grp cat Value
1:   A  01   234
2:   A  02   234
3:   B  01   235
4:   B  02   536
5:   C  01   235

> dt1 <- data.table(grp = c("A","A","A","A","A","A","B","B","B", "B","C"),
                   cat = c("01","01","02","02","03","04", "01","01", "02", "03","01"),
                   rec = c(5435,4341, 32525,436,7087,467,523,245,568,24,789),
                   val = c(346,6876,436,6807,465,65875,6432,754,326532,746,578))

> dt1
    grp cat   rec    val
 1:   A  01  5435    346
 2:   A  01  4341   6876
 3:   A  02 32525    436
 4:   A  02   436   6807
 5:   A  03  7087    465
 6:   A  04   467  65875
 7:   B  01   523   6432
 8:   B  01   245    754
 9:   B  02   568 326532
10:   B  03    24    746
11:   C  01   789    578

我想删除dt1中没有对应catgrp的表dt中的记录。

例如对于grp A,我没有与dt中的cat 03和04相关联的记录。所以我想在dt1中删除它们。

我的决赛桌dt1必须看起来像

> dt1
    grp cat   rec    val
 1:   A  01  5435    346
 2:   A  01  4341   6876
 3:   A  02 32525    436
 4:   A  02   436   6807
 5:   B  01   523   6432
 6:   B  01   245    754
 7:   B  02   568 326532
 8:   C  01   789    578

如何使用R

中的data.table执行此操作

1 个答案:

答案 0 :(得分:0)

我们可以做到

dt1[dt[, -3], on = .(grp, cat)]
#    grp cat   rec    val
#1:   A  01  5435    346
#2:   A  01  4341   6876
#3:   A  02 32525    436
#4:   A  02   436   6807
#5:   B  01   523   6432
#6:   B  01   245    754
#7:   B  02   568 326532
#8:   C  01   789    578