R:子串匹配

时间:2016-01-22 07:41:20

标签: r match

我有一列字符names,其中包含以下内容:

Raymond K
Raymond K-S
Raymond KS
Bill D
Raymond Kerry
Blanche D
Blanche Diamond
Bill Dates

我还有一个包含以下内容的字符向量m_names

Raymond K
Blanche D

我想创建一个列outcome,如果存在匹配的子字符串,则返回非零整数;如果没有匹配则返回0。例如,对于上面的文本列,我理想地希望看到结果

[1] 1 1 1 0 1 2 2 0

目前,我尝试过以下代码:

outcome <- pmatch(as.character(names), m_names, nomatch = 0)

但这仅返回以下outcome

[1] 1 0 0 0 1 2 0 0

即使没有完全匹配,我怎样才能确保代码仍会返回一个标识R中部分匹配的值?

3 个答案:

答案 0 :(得分:4)

一个包含一些文档和搜索字符串的简单示例:

# Some documents
docs <- c("aab", "aba", "bbaa", "b")

# Some search strings (regular expressions)
searchstr <- c("aa", "ab")

1)结果向量中的数字应计算匹配搜索字符串的数量(1表示“aa”或“ab”匹配“,2表示两者匹配)

Reduce('+', lapply(searchstr, grepl, x = docs))
# Returns: [1] 2 1 1 0

2)结果的数量应表明搜索字符串1是匹配还是搜索字符串2匹配。如果两者匹配,则返回最高数字。 (我想,这就是你的意图)

n <- length(searchstr)
Reduce(pmax, lapply(1:n, function(x) x * grepl(searchstr[x], docs)))
# Returns: [1] 2 2 1 0

现在我们最后考虑你的例子:

docs <- c("Raymond K", "Raymond K", "Raymond KS", "Bill D", 
          "Raymond Kerry", "Blanche D", "Blanche Diamond", 
          "Bill Dates")
searchstr <- c("Raymond K", "Blanche D")
Reduce(pmax, lapply(1:n, function(x) x * grepl(searchstr[x], docs)))
# Returns: [1] 1 1 1 0 1 2 2 0

答案 1 :(得分:1)

我会用stringi

来做
library("stringi")    

# data example:

a <- read.table(text="
                Raymond K
                Raymond K-S
                Raymond KS
                Bill D
                Raymond Kerry
                Blanche D
                Blanche Diamond
                Bill Dates", 
                stringsAsFactors=FALSE, sep="\t")

wek <- c("Raymond K", "Blanche D")

# solution

klasa <- numeric(length(a[, 1]))
for(i in 1:length(wek)){
    klasa[stri_detect_fixed(a[, 1], wek[i])] <- i
}

答案 2 :(得分:1)

#create an empty outcome vector

outcome<-vector(mode="integer",length=length(names))

# loop for the length of compare vector (m_names)
for(i in 1:length(m_names)) {
  outcome[grep(m_names[i],names)]<-i
}