Question

我试图在这个给定的字符串中提取包含两个相邻元音的所有单词。

x <- "The team sat next to each other all year and still failed."

结果为"team", "each", "year", "failed"

到目前为止，我已尝试使用[aeiou][aeiou]与regmatches一起使用，但它只给了我部分内容。

感谢。

Answer 1

您可以在字符类之前和之后放置\w*以匹配“零个或多个”字符。

x <- "The team sat next to each other all year and still failed."
regmatches(x, gregexpr('\\w*[aeiou]{2}\\w*', x))[[1]]
# [1] "team"   "each"   "year"   "failed"

Answer 2

words <-unlist(strsplit(x, " "))
words[grepl("[aeiou]{2}", words)]
#[1] "team"    "each"    "year"    "failed."

如果你想清理点状物，可能是：

> words <-unlist(strsplit(x, "[[:punct:] ]"))
> words[grepl("[aeiou]{2}", words)]

Answer 3

\w*[aeiou][aeiou]\w*

试试这个。看看演示。

https://regex101.com/r/hJ3zB0/5

Answer 4

与stringr

相同

library(stringr)
xx <- str_split(x, " ")[[1]]
xx[str_detect(xx, "[aeiou]{2}")]
## [1] "team"    "each"    "year"    "failed."

修改

正如@akrun强调的那样，可以将其简化为

str_extract_all(x, "\\w*[aeiou]{2}\\w*")[[1]]
## [1] "team"   "each"   "year"   "failed"

R从字符串中提取项目

4 个答案:

修改