找到与R匹配的最佳字符串

时间:2014-07-06 02:45:44

标签: r string string-matching fuzzy-search

从此L Hernandez

开始

来自包含以下内容的载体:

[1] "HernandezOlaf "    "HernandezLuciano " "HernandezAdrian "

我试过了:

'subset(ABC, str_detect(ABC, "L Hernandez") == TRUE)'

Hernandez这个名字包括首都L anyplace是理想的输出。

所需的输出为HernandezLuciano

3 个答案:

答案 0 :(得分:2)

可能有帮助:

vec1 <- c("L Hernandez", "HernandezOlaf ","HernandezLuciano ", "HernandezAdrian ")
grep("L ?Hernandez|Hernandez ?L",vec1,value=T)
#[1] "L Hernandez" "HernandezLuciano "

更新

variable <- "L Hernandez"

v1 <- gsub(" ", " ?", variable) #replace space with a space and question mark 
v2 <- gsub("([[:alpha:]]+) ([[:alpha:]]+)", "\\2 ?\\1", variable) #reverse the order of words in the string and add question mark

您还可以使用strsplitvariable拆分为@rawr评论

grep(paste(v1,v2, sep="|"), vec1,value=T)
#[1] "L Hernandez"       "HernandezLuciano "

答案 1 :(得分:0)

您可以使用agrep函数进行近似字符串匹配。 如果你只是运行这个函数,它匹配每个字符串......

agrep("L Hernandez", c("HernandezOlaf ",    "HernandezLuciano ", "HernandezAdrian "))
[1] 1 2 3

但如果你稍微修改一下&#34; L Hernandez&#34; - &GT; &#34; Hernandez L&#34;

agrep("Hernandez L", c("HernandezOlaf ",    "HernandezLuciano ", "HernandezAdrian "))
[1] 1 2 3

并更改最大距离

agrep("Hernandez L", c("HernandezOlaf ",    "HernandezLuciano ", "HernandezAdrian "),0.01)
[1] 2
你得到了正确的答案。这只是一个想法,可能适合您:)

答案 2 :(得分:0)

如果您只想在大写字母L之后使用全名,则可以修改以下内容:

vec1[grepl("Hernandez", vec1) & grepl("L\\.*", vec1)]
[1] "L Hernandez"       "HernandezLuciano

vec1[grepl("Hernandez", vec1) & grepl("L[[:alpha:]]", vec1)]
[1] "HernandezLuciano "

该表达式在&#34; Hernandez&#34;然后看看是否有资本&#34; L&#34;随后是任何角色或空间。第二个版本需要在首都&#34; L&#34;。

之后写一封信 顺便说一下,你似乎无法将这些人聚集在一起。

vec1[grepl("Hernandez", vec1) & grepl("L\\[[:alpha:]]", vec1)]
character(0)