我有一个数据框,我正在尝试突变一个新列,并给1,0重复。 例如我有如下数据框
df4 <- data.frame(emp_id =c("DEV-2962","KTN_2252","ANA2719","ITI_2624","DEV2698","HRT2921","","KTN2624","DEV2698","ITI2535","DEV2698","HRT2837","ERV2951","KTN2542","ANA2813","ITI2210"),
email = c("akash.dev@abcd.com","rahul.singh@abcd.com","salman.abbas@abcd.com","ram.lal@abcd.com","ram.lal@xyz.com","prabal.garg@xyz.com","sanu.ali@abcd.com","kunal.singh@abcd.com","lakhan.tomar@abcd.com","praveen.thakur@abcd.com","sarman.ali@abcd.com","zuber.khan@dkl.com","giriraj.singh@dkl.com","lokesh.sharma@abcd.com","pooja.pawar@abcd.com","nikita.sharma@abcd.com"))
现在我要为新的突出显示重复更改一列。
ID = "emp_id"
Email = "email"
ID <- sym(ID)
Email <- sym(email)
df4 <- df4 %>% filter(!is.na(!!Email)) %>% group_by(!!Email) %>%
mutate(Flag=1:n(),`Duplicate_email`=ifelse(Flag==1,0,1)) %>% select(-Flag) %>% ungroup(.)
但是这用电子邮件创建了一个新列,我想创建一个新列,如果发现重复列,则将1赋予突变列。
答案 0 :(得分:0)
我们可以使用duplicated
函数来实现您想要的功能:
##
ID = "emp_id"
Email = "email"
ID <- sym(ID)
Email <- sym(Email) ## match the variable name above
df4 <- df4 %>% filter(!is.na(!!Email)) %>%
mutate(`Duplicate_email` = as.integer(duplicated(!!Email)))
答案 1 :(得分:0)
在这里您可以看到错误所在。
library(dplyr)
ID = "emp_id"
Email = "email"
ID <- sym(ID)
Email <- sym(Email) ## you wrote: Email <- sym(email)
df4 %>%
filter(!is.na(!!Email)) %>%
group_by(!!Email) %>%
mutate(Flag=1:n(),
`Duplicate_email`= ifelse(Flag==1,0,1)) %>%
select(-Flag) %>%
ungroup(.)
您的代码可以通过这种方式简化。
它更紧凑,可读性和速度更快,但是得到的结果却完全相同。
df4 %>%
filter(!is.na(!!Email)) %>%
mutate(Duplicate_email = +duplicated(!!Email))