Question

在维基百科文章中替换引用的正则表达是什么？

示例输入：

 text <- "[76][note 7] just like traditional Hinduism regards the Vedas "

预期产出：

"just like traditional Hinduism regards the Vedas"

我试过了：

> text <- "[76][note 7] just like traditional Hinduism regards the Vedas "
> library(stringr)
> str_replace_all(text, "\\[ \\d+ \\]", "")
[1] "[76][note 7] just like traditional Hinduism regards the Vedas "

Answer 1

试试这个：

text <- "[76][note 7] just like traditional Hinduism regards the Vedas "
 library(stringr)
 str_replace_all(text, "\\[[^\\]]*\\]\\s*", "")

输出：

 "just like traditional Hinduism regards the Vedas "

Answer 2

这个正则表达式是一个选项：

(?!.*\]).*

看起来（括号内的块）会贪婪地将指针设置在最后一个“]”之后。表达式“。*”的其余部分将匹配您想要的内容（包括前导空格//但在您选择的语言中将是一个简单的空格）直到新行

Answer 3

这应该可以解决问题：

trimws(sub("\\[.*\\]", "",text))

结果：

[1] "just like traditional Hinduism regards the Vedas"

此模式会查找左括号（\\[），右括号（\\]）以及介于两者之间的所有内容（.*）。

默认情况下，.*是贪婪的，也就是说，它会尝试尽可能匹配，即使有结束和左括号，直到它找到最后一个结束括号。这个匹配被一个空字符串替换。

最后，trimws函数将删除星号和结果末尾的空格。

编辑：删除句子中的引文

如果句子中的几个点引用，则模式和功能将更改为：

trimws(gsub(" ?\\[.*?\\] ", "",text))

例如，如果句子是：

text1 <- "[76][note 7] just like traditional Hinduism [34] regards the Vedas "
text2 <- "[76][note 7] just like traditional Hinduism[34] regards the Vedas "

各自的结果将是：

[1] "just like traditional Hinduism regards the Vedas"
[1] "just like traditional Hinduism regards the Vedas"

模式更改：

.*?会将正则表达式从贪婪变为懒惰。也就是说，它会尝试匹配最短的模式，直到找到第一个结束括号。

起始?（空格+问号）这将尝试匹配左括号前的可选空格。

Answer 4

由于模式中有空格，因此\\[ \\d+ \\]不起作用。此外，如果您删除空格，则表达式只会与[ + digits + ]匹配，并且与[note 7]不匹配 - 就像子字符串一样。

以下是使用带有TRE正则表达式的gsub的Base R解决方案（不需要perl=TRUE）：

text <- "[76][note 7] just like traditional Hinduism regards the Vedas "
trimws(gsub("\\[[^]]+]", "", text))
## Or to remove only those [] that contain digits/word + space + digits
trimws(gsub("\\[(?:[[:alnum:]]+[[:blank:]]*)?[0-9]+]", "", text))

请参阅R demo

模式说明：

\\[ - 文字[（必须在char类外转义）
(?:[[:alnum:]]+[[:blank:]]*)? - （由于?量词结尾处的可选序列）1个或多个字母数字后跟0 +空格或制表符
[0-9]+ - 1+位数
] - 文字]（不需要在字符类之外转义）

trimws删除前导/尾随空格。

请参阅regex demo（注意选择了PCRE选项，因为它支持POSIX字符类，不要使用此站点来测试TRE正则表达式模式！）。

正则表达式替换R中的wiki引用

4 个答案: