删除R data.table中的错误代码和相关记录

时间:2017-06-09 18:18:52

标签: r dataframe data.table

我在R中有一个data.table,比如dt,看起来像:

> dt <- data.table(adr = c("A", "A", "A","A","A","A","A","B", "B", "C", "C", "C", "D", "E", "E"),
                  code=c("0001","0001","0001","0001","0001","0001","0001","0001","0001", "0002", "0002", "0002", "0003", "0003", "0003"),
                  num = c(1,67,875,467,986,34,987,876,785, 67,9078,45,907,451,987))
> dt
    adr code  num
 1:   A 0001    1
 2:   A 0001   67
 3:   A 0001  875
 4:   A 0001  467
 5:   A 0001  986
 6:   A 0001   34
 7:   A 0001  987
 8:   B 0001  876
 9:   B 0001  785
10:   C 0002   67
11:   C 0002 9078
12:   C 0002   45
13:   D 0003  907
14:   E 0003  451
15:   E 0003  987

对于单个值code,可以有adr的单个值。例如,对于code = 0001,我们有两个adr AB。这是错的。 adr及其相关记录是正确的,其中大部分都出现在该特定代码中(超过50%)。

因此对于代码0001,adr A是7次而adr B是2次,因此adr B及其关联记录是错误的。我想找到这个,并希望删除每个代码的错误记录。

输出必须如下:

> dt
        adr code  num
     1:   A 0001    1
     2:   A 0001   67
     3:   A 0001  875
     4:   A 0001  467
     5:   A 0001  986
     6:   A 0001   34
     7:   A 0001  987
     8:   C 0002   67
     9:   C 0002 9078
    10:   C 0002   45
    11:   E 0003  451
    12:   E 0003  987

如何在R中使用data.table

执行此操作

1 个答案:

答案 0 :(得分:0)

我已将dt设为data.frame()而不是data.table(),因此我无需加载其他包,但您可以按以下方式完成此操作:

require(dplyr)

dt <- dt %>% group_by(code, adr) %>% mutate(count = n()) %>% group_by(code) %>% filter(count == max(count)) %>% select(-count)