How to fuzzy match text in a column and then replace with a consensus in R

时间:2016-03-16 11:33:03

标签: r fuzzy-comparison stringdist

I have a dataframe as follows

FName  LName  
Ayeko   Seki
Ayeko   Seki
Ayeko   Seki
Ayeko   Zeki
Aveko   Seki
Avoo    Zooki
Jacques Bergmann.
Jacques Burgman
J       Bergman
Jacques Bergmann
Jacques Bergmann
Jacques Bergmann
Jacques Bergmann
David   Goliath

J Bergman, Jacques Bergmann., Jacques Burgman and Jacques Bergmann are the same person as are the first five entries, but not the sixth or last. I would like to try to fuzzy match the names across the two columns and then replace them with a consensus (or the most common among the fuzzy matches I guess is the alternative) so that the outputted data frame should be:

FName  LName  
Ayeko   Seki
Avoo    Zooki
Jacques Bergmann
David   Goliath

I have tried using stringdist() but the issue I am having really is with a) getting the consensus match and b) then replacing the matches with the consensus

0 个答案:

没有答案