我有一个看起来像这样的数据框:
id Name Desc
1 A abc
1 A abc
1 B def
2 C ghi
2 D jkl
3 E mno
4 F pqr
我要标识重复的ID,然后用重复的标记如下:
id Name Desc Person
1 A abc Same Person
1 A abc Same Person
1 B def Different Person
2 C ghi Different Person
2 D jkl Different Person
3 E mno Different Person
4 F pqr Different Person
请帮助!
答案 0 :(得分:3)
我们可以使用duplicated
创建一个逻辑向量,将其转换为数字索引,并根据输入的向量来更改值
df1$Person <- c("Different Person", "Same Person")[(duplicated(df1)|duplicated(df1,
fromLast = TRUE)) + 1]
或与dplyr
library(dplyr)
df1 %>%
group_by_all %>%
mutate(Person = case_when(n() >1 ~ "Same Person", TRUE ~ "Different Person"))
df1 <- structure(list(id = c(1L, 1L, 1L, 2L, 2L, 3L, 4L), Name = c("A",
"A", "B", "C", "D", "E", "F"), Desc = c("abc", "abc", "def",
"ghi", "jkl", "mno", "pqr")), class = "data.frame", row.names = c(NA,
-7L))