Question

我正在跟进一个问题here，询问如何添加前缀＆＃34;而不是_＆＃34;在否定之后的一个词。

在评论中，MrFlick使用正则表达式gsub("(?<=(?:\\bnot|n't) )(\\w+)\\b", "not_\\1", x, perl=T)提出了解决方案。

我想编辑这个正则表达式，以便将 not _ 前缀添加到＆＃34; not＆＃34;之后的所有单词中。或＆＃34; n＆＃t;＃34;直到有一些标点符号。

如果我正在编辑cptn的例子，我想：

x <- "They didn't sell the company, and it went bankrupt"

转变为：

"They didn't not_sell not_the not_company, and it went bankrupt"

使用反向引用仍然可以解决这个问题吗？如果是这样，任何一个例子将非常感激。谢谢！

Answer 1

您可以使用

(?:\bnot|n't|\G(?!\A))\s+\K(\w+)\b

并替换为not_\1。请参阅regex demo。

<强>详情

(?:\bnot|n't|\G(?!\A)) - 三种选择中的任何一种：
- \bnot - 全文not
- n't - n't
- \G(?!\A) - 上一次成功匹配位置的结束
\s+ - 1+空格
\K - 匹配重置运算符，丢弃目前为止匹配的文本
(\w+) - 第1组（在替换模式中引用\1）：1+个字符（数字，字母或_）
\b - 一个单词边界。

R demo：

x <- "They didn't sell the company, and it went bankrupt"
gsub("(?:\\bnot|n't|\\G(?!\\A))\\s+\\K(\\w+)\\b", "not_\\1", x, perl=TRUE)
## => [1] "They didn't not_sell not_the not_company, and it went bankrupt"

Answer 2

首先，你应该在你想要的标点符号上拆分字符串。例如：

x <- "They didn't sell the company, and it went bankrupt. Then something else"
x_split <- strsplit(x, split = "[,.]") 
[[1]]
[1] "They didn't sell the company" " and it went bankrupt"        " Then something else"

然后将正则表达式应用于列表x_split的每个元素。最后合并所有部分（如果需要）。

Answer 3

这不是理想的，但可以完成工作：

x <- "They didn't sell the company, and it did not go bankrupt. That's it" 

gsub("((^|[[:punct:]]).*?(not|n't)|[[:punct:]].*?((?<=\\s)[[:punct:]]|$))(*SKIP)(*FAIL)|\\s", 
     " not_", x, 
     perl = TRUE)

# [1] "They didn't not_sell not_the not_company, and it did not not_go not_bankrupt. That's it"

备注：

这使用(*SKIP)(*FAIL)技巧来避免任何你不想要正则表达式匹配的模式。这基本上用not_替换每个空格，除了它们之间的空间：

字符串或标点符号的开头"not"或"n't" 或

标点符号和标点符号（后面没有空格）或字符串结尾

R中的否定，如何在R中否定后替换单词？

3 个答案: