处理和标记R数据帧中的重复条目

时间:2019-05-02 19:14:08

标签: r

我有一个看起来像这样的数据框:

   id Name  Desc 
    1 A     abc
    1 A     abc  
    1 B     def  
    2 C     ghi  
    2 D     jkl  
    3 E     mno  
    4 F     pqr  

我要标识重复的ID,然后用重复的标记如下:

id Name  Desc Person
 1 A     abc  Same Person
 1 A     abc  Same Person
 1 B     def  Different Person
 2 C     ghi  Different Person
 2 D     jkl  Different Person
 3 E     mno  Different Person
 4 F     pqr  Different Person

请帮助!

1 个答案:

答案 0 :(得分:3)

我们可以使用duplicated创建一个逻辑向量,将其转换为数字索引,并根据输入的向量来更改值

df1$Person <- c("Different Person", "Same Person")[(duplicated(df1)|duplicated(df1, 
          fromLast = TRUE)) + 1]

或与dplyr

library(dplyr)
df1 %>% 
  group_by_all %>%
  mutate(Person = case_when(n() >1 ~ "Same Person", TRUE ~ "Different Person"))

数据

df1 <- structure(list(id = c(1L, 1L, 1L, 2L, 2L, 3L, 4L), Name = c("A", 
"A", "B", "C", "D", "E", "F"), Desc = c("abc", "abc", "def", 
"ghi", "jkl", "mno", "pqr")), class = "data.frame", row.names = c(NA, 
 -7L))