Question

我有一个包含一列的数据框，其中包含需要标准化的名称。

这是一个例子：

PatientId<- c(1,1,1,2,2,2)
Visit_Date<- c("28/02/2014", "29/04/2014", "10/02/2014", "25/01/2014", "01/02/2014", "08/01/2014")
ClinicName<- c("A","A","A", "B","B","B")
PractitionerName<- c("Ahmad Mobin", "Amhad Mobin", "Ahmaad Mobin", "Hadley wickham", "Hadley Wuckham", "Hadley Wihcam")

example_df<- cbind(PatientId, Visit_Date, ClinicName, PractitionerName)
example_df<- as.data.frame(example_df)

这是关于我如何标准化名称的代码，但是想知道我是否可以使用更清晰的代码：

example_df1<- example_df %>% 
              filter(str_detect(PractitionerName, "Mobin")==TRUE) %>% 
filter(ClinicName=="A") %>% 
mutate(PractitionerName="Ahmad Mobin")  

#Now adding those changes back to my main dataset `example_df`

temp_df<- example_df%>% anti_join(example_df1, by=c("PatientId", 
"Visit_Date"))   
example_df<-rbind(example_df1,temp_df)

#-Repeat the above process to standardize "Hadley Wickham"

  example_df1<- example_df %>% 
              filter(str_detect(PractitionerName, "Hadley")==TRUE) %>% 
 filter(ClinicName=="B") %>% 
 mutate(PractitionerName="Hadley Wickham")  

#Now adding those changes back to my main dataset `example_df`

temp_df<- example_df%>% anti_join(example_df1, by=c("PatientId", 
"Visit_Date"))   
 example_df<-rbind(example_df1,temp_df)

Answer 1

哦......我意识到我没有正确地阅读你的问题。我会按如下方式执行此任务，如果您有很多这样的任务，您可能希望将其包装在函数中：

example_df$PractitionerName[grepl(".*Mobin.*", example_df$PractitionerName) & example_df$ClinicName == "A"] <- "Ahmad Mobin"

Answer 2

根据问题，您还可以考虑使用字符串距离

library(stringdist)
practitioners <- c("Ahmad Mobin", "Hadley Wickham")
example_df %>% 
  mutate(PractitionerName = 
           practitioners[apply(stringdistmatrix(PractitionerName, practitioners), 1, which.max)])

  PatientId Visit_Date ClinicName PractitionerName
1         1 28/02/2014          A   Hadley Wickham
2         1 29/04/2014          A   Hadley Wickham
3         1 10/02/2014          A   Hadley Wickham
4         2 25/01/2014          B      Ahmad Mobin
5         2 01/02/2014          B      Ahmad Mobin
6         2 08/01/2014          B      Ahmad Mobin

dplyr替换多个字符串

2 个答案: