R中字符串的部分匹配,得到匹配和不匹配的记录

时间:2015-08-12 04:44:33

标签: r string

我有一个名为mystring的对象,其中我有一个全长字符串,它们在另一个名为match.string的对象中具有部分匹配。我想使用match.string在mystring中诱饵并找到全长匹配的字符串。我还想要记录mystring中匹配且不匹配的字符串。

mystring<-c("the_dootle_doo_bottle-doo","no_cuddle-doo_do_bottle-coo","tape-it-ape-it","mac-chicken-no-good")
match.string<-c("the_dootle","no_cuddle-doo","mac", "I-loathe-it","no-way")

在我想要的结果中:

"the_dootle_doo_bottle-doo" "no_cuddle-doo_do_bottle-coo" "mac-chicken-no-good"

在结果中,我还希望看到match.strings中匹配("the_dootle","no_cuddle-doo","mac")和不匹配("I-loathe-it","no-way")的字符串列表与mystring中的字符串。

2 个答案:

答案 0 :(得分:3)

你可以这样做:

l <- unlist(lapply(match.string, function(txt) mystring[grepl(txt, mystring)]))

这给出了:

> l
[1] "the_dootle_doo_bottle-doo"   "no_cuddle-doo_do_bottle-coo" "mac-chicken-no-good"        

要获得匹配/不匹配的记录,您可以执行以下操作:

indx <- unlist(lapply(match.string, function(txt) grep(txt, mystring)))

这给出了mystring中匹配字符串的索引:

> indx
[1] 1 2 4

使用此索引,您可以在mystring中获得匹配/不匹配的结果:

> mystring[indx]
[1] "the_dootle_doo_bottle-doo"   "no_cuddle-doo_do_bottle-coo" "mac-chicken-no-good"        
> mystring[-indx]
[1] "tape-it-ape-it"

要获取match.string的已找到和未找到的项目,您可以执行以下操作(按@Frank的建议):

indx2 <- sapply(lapply(match.string, agrepl, mystring), any)

> match.string[indx2]
[1] "the_dootle"    "no_cuddle-doo" "mac"          
> match.string[!indx2]
[1] "I-loathe-it" "no-way" 

作为替代方案,您还可以使用magrittr创建索引:

library(magrittr)
indx2 <- sapply(match.string, . %>% agrepl(., mystring) %>% any )

答案 1 :(得分:1)

尝试,

mystring[pmatch(match.string,mystring)]

# [1] "the_dootle_doo_bottle-doo"   "no_cuddle-doo_do_bottle-coo" "mac-chicken-no-good"  NA    NA