I would like to mark my duplicated values with respect to one column
Example i have a df
X Y Z
1 4 5
2 5 7
1 3 6
7 2 7
then I want a new data frame df2 creating a new column dup which indicates whether X is duplicated or not as
X Y Z dup
1 4 5 TRUE
2 5 7 FALSE
1 3 6 TRUE
7 2 7 FALSE
Any who could tell me how to do it?
答案 0 :(得分:1)
You can do that with data.table
, grouping by your common field and checking you have more than one row for each group:
library(data.table)
dt <- fread("X Y Z
1 4 5
2 5 7
1 3 6
7 2 7")
dt[, dup := .N > 1, by = X]
X Y Z dup
1: 1 4 5 TRUE
2: 2 5 7 FALSE
3: 1 3 6 TRUE
4: 7 2 7 FALSE
答案 1 :(得分:1)
Here's a method using ave()
:
df$dup <- ave(df$X, df$X, FUN = length) > 1L;
df;
## X Y Z dup
## 1 1 4 5 TRUE
## 2 2 5 7 FALSE
## 3 1 3 6 TRUE
## 4 7 2 7 FALSE
答案 2 :(得分:1)
Using duplicated
from base R:
df2 <- df
df2$dup <- c(duplicated(df2$X, fromLast = TRUE) | duplicated(df2$X))