Question

我尝试使用gsub（）R函数按模式提取子字符串。

# Example: extracting "7 years" substring. string <- "Psychologist - 7 years on the website, online" gsub(pattern="[0-9]+\\s+\\w+", replacement="", string)

[1] "Psychologist - on the website, online"

正如您所看到的，使用gsub（）很容易排除所需的子字符串，但我需要反转结果并仅获得“7年”。我想使用“^”，就像那样：

gsub(pattern="[^[0-9]+\\s+\\w+]", replacement="", string)

拜托，有没有人能帮我正确的正则表达式？

Answer 1

您可以使用

sub(pattern=".*?([0-9]+\\s+\\w+).*", replacement="\\1", string)

请参阅this R demo。

<强>详情

.*? - 任意0个字符，尽可能少
([0-9]+\\s+\\w+) - 捕获第1组：
- [0-9]+ - 一个或多个数字
- \\s+ - 一个或多个空格
- \\w+ - 一个或多个单词字符
.* - 字符串的其余部分（任意0+字符，尽可能多）

替换中的\1替换为组1的内容。

Answer 2

您可以使用与\d相反的\D中的R：

string <- "Psychologist - 7 years on the website, online"
sub(pattern = "\\D*(\\d+\\s+\\w+).*", replacement = "\\1", string)
# [1] "7 years"

\D*表示：尽可能没有数字，其余部分在一个组中捕获，然后替换完整的字符串。

见a demo on regex101.com。

如何用R的反模式提取子字符串？

2 个答案: