检查列是否重复

时间:2020-10-30 13:25:33

标签: r dplyr

我有一个数据框,我正在尝试突变一个新列,并给1,0重复。 例如我有如下数据框

df4 <- data.frame(emp_id =c("DEV-2962","KTN_2252","ANA2719","ITI_2624","DEV2698","HRT2921","","KTN2624","DEV2698","ITI2535","DEV2698","HRT2837","ERV2951","KTN2542","ANA2813","ITI2210"),
                  email = c("akash.dev@abcd.com","rahul.singh@abcd.com","salman.abbas@abcd.com","ram.lal@abcd.com","ram.lal@xyz.com","prabal.garg@xyz.com","sanu.ali@abcd.com","kunal.singh@abcd.com","lakhan.tomar@abcd.com","praveen.thakur@abcd.com","sarman.ali@abcd.com","zuber.khan@dkl.com","giriraj.singh@dkl.com","lokesh.sharma@abcd.com","pooja.pawar@abcd.com","nikita.sharma@abcd.com"))

现在我要为新的突出显示重复更改一列。

ID = "emp_id"
Email = "email"

ID <- sym(ID)
Email <- sym(email)

df4 <- df4 %>% filter(!is.na(!!Email)) %>% group_by(!!Email) %>%
   mutate(Flag=1:n(),`Duplicate_email`=ifelse(Flag==1,0,1)) %>% select(-Flag) %>% ungroup(.)

但是这用电子邮件创建了一个新列,我想创建一个新列,如果发现重复列,则将1赋予突变列。

2 个答案:

答案 0 :(得分:0)

我们可以使用duplicated函数来实现您想要的功能:

##
ID = "emp_id"
Email = "email"

ID <- sym(ID)
Email <- sym(Email) ## match the variable name above

df4 <- df4 %>% filter(!is.na(!!Email)) %>%
   mutate(`Duplicate_email` = as.integer(duplicated(!!Email)))

答案 1 :(得分:0)

代码审查

在这里您可以看到错误所在。

library(dplyr)

ID = "emp_id"
Email = "email"

ID <- sym(ID)
Email <- sym(Email) ## you wrote: Email <- sym(email)

df4 %>%
  filter(!is.na(!!Email)) %>% 
  group_by(!!Email) %>%
  mutate(Flag=1:n(),
         `Duplicate_email`= ifelse(Flag==1,0,1)) %>% 
  select(-Flag) %>% 
  ungroup(.)

简化

您的代码可以通过这种方式简化。

它更紧凑,可读性和速度更快,但是得到的结果却完全相同。

df4 %>% 
  filter(!is.na(!!Email)) %>% 
  mutate(Duplicate_email = +duplicated(!!Email))