Question

我正在尝试从以下字符串中提取22 chocolates：

   SOMETEXT for 2 FFXX. Another 22 chocolates & 45 chamkila.

使用正则表达式\\d+\\s*(chocolates.|chocolate.)。我用过：

grep("\\d+\\s*(chocolates.|chocolate.)",s)

但它没有给出字符串22 chocolates。如何提取与正则表达式匹配的部分？

Answer 1

您的原始模式不会返回22 chocolates，因为它是应在匹配函数中使用的模式，而grep仅返回字符向量中的整个项目在里面任何地方都包含匹配。

另请注意，(chocolates.|chocolate.)替换组可以缩短为chocolates?.，因为唯一的区别是chocolate的复数个案，并且可以使用?轻松实现量词（= 1或0次出现）。

匹配函数示例可以与stringr::str_extract（str_extract_all匹配，以匹配所有匹配项）：

> library(stringr)
> x <- " SOMETEXT for 2 FFXX. Another 22 chocolates & 45 chamkila."
> p <- "\\d+\\s*chocolates?"
> str_extract(x, p)
[1] "22 chocolates"

或基础R regmatches / regexpr（或gregexpr提取多次出现）方法：

> library(stringr)
> x <- " SOMETEXT for 2 FFXX. Another 22 chocolates & 45 chamkila."
> p <- "\\d+\\s*chocolates?"
> regmatches(x, regexpr(p, x))
[1] "22 chocolates"

提取匹配正则表达式的子字符串

1 个答案: