Question

（在R gsub（）中），我需要捕获较大字符串中某个特定短语之后出现的四个单词。在here提供的智慧的基础上，我想到了：^.*\\b(particular phrase)\\W+(\\w+\\W+\\w+\\W+\\w+\\W+\\w+).*$

例如：

this_txt <- "Blah blah particular phrase Extract These Words Please for the blah blah. Ignore blah this other stuff blah blah, blah."
this_pattern <- "^.*\\b(particular phrase)\\W+(\\w+\\W+\\w+\\W+\\w+\\W+\\w+).*$"
gsub(this_pattern, "\\2", this_txt, ignore.case = T)
# [1] "Extract These Words Please"

但是模式中\\w+\\W+的重复似乎不太合理。当然有更好的方法。我以为^.*\\b(particular phrase)\\W+(\\w+\\W+){4}.*$ 可能有效，但无效。

Answer 1

您可以使用

Category category = (Category)yourComboBox.SelectedItem

在R中，

^.*\b(particular phrase)\W+((?:\w+\W+){3}\w+).*$

请参见regex demo

this_pattern <- "^.*\\b(particular phrase)\\W+((?:\\w+\\W+){3}\\w+).*$"替换为(\w+\W+\w+\W+\w+\W+\w+)。 ((?:\w+\W+){3}\w+)是包含两个子模式的capturing group（((?:\w+\W+){3}\w+)）：

(...)-与non-capturing group相匹配的三个重复
- (?:\w+\W+){3}-1个或多个单词字符
- \w+-1个或多个非单词字符
\W+-1个或更多单字字符。

How to say（\ w + \ W +）乘以4 in regex（R gsub）

1 个答案: