Question

我想从字符串中删除字母，但保护特定字词。这是一个例子：

my.string <- "Water the 12 gold marigolds please, but not the 45 trees!"

desired.result <- "12 marigolds, 45 trees"

我尝试了下面的代码，结果令人惊讶。我以为()会保护它所包含的内容。相反，恰恰相反。仅删除了()中的字词（加上!）。

gsub("(marigolds|trees)\\D", "", my.string)

# [1] "Water the 12 gold please, but not the 45 "

以下是一个较长字符串的示例：

my.string <- "Water the 12 gold marigolds please, but not the 45 trees!, The 7 orange marigolds are fine."

desired.result <- "12 marigolds, 45 trees, 7 marigolds"

gsub("(marigolds|trees)\\D", "", my.string)

返回：

[1] "Water the 12 gold please, but not the 45 , The 7 orange are fine."

感谢您的任何建议。我更喜欢基础regex中的R解决方案。

Answer 1

使用字边界，负向前瞻断言。

> my.string <- "Water the 12 gold marigolds please, but not the 45 trees!"
> gsub("\\b(?!marigolds\\b|trees\\b)[A-Za-z]+\\s*", "", my.string, perl=TRUE)
[1] "12 marigolds , 45 trees!"
> gsub("\\b(?!marigolds\\b|trees\\b)[A-Za-z]+\\s*|!", "", my.string, perl=TRUE)
[1] "12 marigolds , 45 trees"

Answer 2

捕获组的另一种方式：

my.string <- "Water the 12 gold marigolds please, but not the 45 trees!, The 7 orange marigolds are fine."
gsub("(?i)\\b(?:(marigolds|trees)|[a-z]+)\\b\\s*|[.?!]", "\\1", my.string, perl=TRUE)

保护特定单词，删除字符串中的字母

2 个答案: