在R的列中查找具有多个值的所有记录

时间:2018-06-21 08:21:52

标签: r multiple-columns record

对于示例数据框:

df <- structure(list(code = c("a1", "a1", "b2", "v4", "f5", "f5", "h7", 
       "a1"), name = c("katie", "katie", "sally", "tom", "amy", "amy", 
       "ash", "james"), number = c(3.5, 3.5, 2, 6, 4, 4, 7, 3)), .Names = c("code", 
       "name", "number"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
       -8L), spec = structure(list(cols = structure(list(code = structure(list(), class = c("collector_character", 
       "collector")), name = structure(list(), class = c("collector_character", 
       "collector")), number = structure(list(), class = c("collector_double", 
       "collector"))), .Names = c("code", "name", "number")), default = structure(list(), class = c("collector_guess", 
       "collector"))), .Names = c("cols", "default"), class = "col_spec"))

我要突出显示所有具有两个或多个相同值的“ code”值的记录。我知道我可以使用:

df[duplicated(df$name), ]

但这仅突出显示重复的记录,但是我希望所有重复的代码值(即3个a1s和2个f5s)。

有什么想法吗?

3 个答案:

答案 0 :(得分:8)

df[duplicated(df$code) | duplicated(df$code, fromLast=TRUE), ]
  code  name number
1   a1 katie    3.5
2   a1 katie    3.5
5   f5   amy    4.0
6   f5   amy    4.0
8   a1 james    3.0

受Alok VS启发的另一种解决方案:

ta <- table(df$code)
df[df$code %in% names(ta)[ta > 1], ]

编辑:如果可以保留基数R,那么gdata::duplicated2()可以提供更多的简洁性。

library(gdata)
df[duplicated2(df$code), ]

答案 1 :(得分:2)

将索引转换为值-然后检查“代码”是否适合以下值:

 df[df$code %in% df$code[duplicated(df$code)], ]
  code  name number
1   a1 katie    3.5
2   a1 katie    3.5
5   f5   amy    4.0
6   f5   amy    4.0
8   a1 james    3.0

答案 2 :(得分:1)

我想出了一个粗略的解决方案,

temp<-aggregate(df$code, by=list(df$code), FUN=length)
temp<-temp[temp$x>1,]

df[df$code %in% temp$Group.1,]