在关键字后立即选择单词

时间:2015-04-30 18:56:48

标签: regex r

我试图使用R立即提取一个关键词。我没有很多正则表达式的经验,所以我到目前为止找到的所有内容都无法帮助我许多。如果我能让函数返回理想的多个实例。

例如,如果我的关键字为the且我的字符串为:

The yellow log is in the stream

它会返回yellowstream

我发现了这个solution for c#,它看起来与我想要的完全一样,但我在R中实现它时遇到了麻烦。

2 个答案:

答案 0 :(得分:2)

试试这个:这会返回黄色'和' stream'

x <- "The yellow log is in the stream"

regmatches(x, gregexpr("(?:(?:T|t)he)\\s(\\w+)", x, perl = TRUE))[[1]]
## [1] "The yellow" "the stream"

答案 1 :(得分:2)

我维护的 qdapRegex 包在after_字典中有一个正则表达式regex_supplement,非常适合这种情况。您可以使用rm_制作自己的after_the功能:

library(qdapRegex)

x<- "The yellow log is in the stream"
after_the <- rm_(pattern = S("@after_", "[Tt]he"), extract = TRUE)
after_the(x)

## [[1]]
## [1] "yellow" "stream"

S函数是sprintf的包装器,允许您轻松地将元素(如本例中的“the”工作)传递给基本正则表达式生成:

S("@after_", "the", "The")
## [1] "(?<=\\b(the|The)\\s)(\\w+)"

修改

library(qdapRegex)

x<- c("The yellow log is in the stream", "I like the one box for a pack")
after_ <- rm_(extract = TRUE)
after_the(x)

after_ <- rm_(extract = TRUE)

words <- c("the", "a", "one")

setNames(lapply(words, function(y){
    after_(x, pattern = S("@after_", y, TC(y)))
}), words)


## $the
## $the[[1]]
## [1] "yellow" "stream"
## 
## $the[[2]]
## [1] "one"
## 
## 
## $a
## $a[[1]]
## [1] NA
## 
## $a[[2]]
## [1] "pack"
## 
## 
## $one
## $one[[1]]
## [1] NA
## 
## $one[[2]]
## [1] "box"