R从字符串中提取项目

时间:2014-12-06 06:31:28

标签: regex r

我试图在这个给定的字符串中提取包含两个相邻元音的所有单词。

x <- "The team sat next to each other all year and still failed."

结果为"team", "each", "year", "failed"

到目前为止,我已尝试使用[aeiou][aeiou]regmatches一起使用,但它只给了我部分内容。

感谢。

4 个答案:

答案 0 :(得分:5)

您可以在字符类之前和之后放置\w*以匹配“零个或多个”字符。

x <- "The team sat next to each other all year and still failed."
regmatches(x, gregexpr('\\w*[aeiou]{2}\\w*', x))[[1]]
# [1] "team"   "each"   "year"   "failed"

答案 1 :(得分:4)

words <-unlist(strsplit(x, " "))
words[grepl("[aeiou]{2}", words)]
#[1] "team"    "each"    "year"    "failed."

如果你想清理点状物,可能是:

> words <-unlist(strsplit(x, "[[:punct:] ]"))
> words[grepl("[aeiou]{2}", words)]

答案 2 :(得分:1)

\w*[aeiou][aeiou]\w*

试试这个。看看演示。

https://regex101.com/r/hJ3zB0/5

答案 3 :(得分:1)

stringr

相同
library(stringr)
xx <- str_split(x, " ")[[1]]
xx[str_detect(xx, "[aeiou]{2}")]
## [1] "team"    "each"    "year"    "failed."

修改

正如@akrun强调的那样,可以将其简化为

str_extract_all(x, "\\w*[aeiou]{2}\\w*")[[1]]
## [1] "team"   "each"   "year"   "failed"