如何匹配最长匹配的字符串

时间:2012-12-04 12:19:25

标签: r

我有字符串和字符向量。我想找到字符向量匹配中的所有字符串尽可能多的字符串来自字符串。 例如:

s <- "abs"
vc <- c("ab","bb","abc","acbd","dert")

result <- c("ab","abc")

字符串应完全匹配前K个字符。我希望尽可能匹配(最大K <=长度(s))。 这里没有匹配“abs”(grep(“abs”,vc)),但是对于“ab”,有两个匹配(结果&lt; -grep(“ab”,vc))。

3 个答案:

答案 0 :(得分:2)

另一种解释:

s <- "abs"
# Updated vc
vc <- c("ab","bb","abc","acbd","dert","abwabsabs")

st <- strsplit(s, "")[[1]]
mtc <- sapply(strsplit(substr(vc, 1, nchar(s)), ""), 
              function(i) {
                m <- i == st[1:length(i)]
                sum(m * cumsum(m))})

vc[mtc == max(mtc)]
#[1] "ab"        "abc"       "abwabsabs"

# Another vector vc
vc <- c("ab","bb","abc","acbd","dert","absq","abab")
....
vc[mtc == max(mtc)]
#[1] "absq"

由于我们只考虑字符串的开头,因此在第一种情况下,最长匹配为"ab",即使"abwabsabs""abs"

编辑:这是一种“单一模式”解决方案,可能更简洁,但我们走了......

vc <- c("ab","bb","abc","acbd","dert","abwabsabs")
(auxOne <- sapply((nchar(s)-1):1, function(i) substr(s, 1, i)))
#[1] "ab"   "a"
(auxTwo <- sapply(nchar(s):2, function(i) substring(s, i)))
#[1] "s" "bs" 
l <- attr(regexpr(
  paste0("^((",s,")|",paste0("(",auxOne,"(?!",auxTwo,"))",collapse="|"),")"),
  vc, perl = TRUE), "match.length")
vc[l == max(l)]
#[1] "ab"        "abc"       "abwabsabs"

答案 1 :(得分:1)

这是一个使用grep的函数,并检查给定的字符串s是否匹配vc中任何字符串的开头,从{{1}的末尾递归删除一个字符}}:

s

答案 2 :(得分:0)

在事实很久之后,只需注意triebeard包现在存在;它非常,非常有效且用户友好,可以找到最长或部分匹配。