我正在跟进一个问题here,询问如何添加前缀"而不是_"在否定之后的一个词。
在评论中,MrFlick使用正则表达式gsub("(?<=(?:\\bnot|n't) )(\\w+)\\b", "not_\\1", x, perl=T)
提出了解决方案。
我想编辑这个正则表达式,以便将 not _ 前缀添加到&#34; not&#34;之后的所有单词中。或&#34; n&#t;#34;直到有一些标点符号。
如果我正在编辑cptn的例子,我想:
x <- "They didn't sell the company, and it went bankrupt"
转变为:
"They didn't not_sell not_the not_company, and it went bankrupt"
使用反向引用仍然可以解决这个问题吗?如果是这样,任何一个例子将非常感激。谢谢!
答案 0 :(得分:1)
您可以使用
(?:\bnot|n't|\G(?!\A))\s+\K(\w+)\b
并替换为not_\1
。请参阅regex demo。
<强>详情
(?:\bnot|n't|\G(?!\A))
- 三种选择中的任何一种:
\bnot
- 全文not
n't
- n't
\G(?!\A)
- 上一次成功匹配位置的结束\s+
- 1+空格\K
- 匹配重置运算符,丢弃目前为止匹配的文本(\w+)
- 第1组(在替换模式中引用\1
):1+个字符(数字,字母或_
)\b
- 一个单词边界。x <- "They didn't sell the company, and it went bankrupt"
gsub("(?:\\bnot|n't|\\G(?!\\A))\\s+\\K(\\w+)\\b", "not_\\1", x, perl=TRUE)
## => [1] "They didn't not_sell not_the not_company, and it went bankrupt"
答案 1 :(得分:0)
首先,你应该在你想要的标点符号上拆分字符串。例如:
x <- "They didn't sell the company, and it went bankrupt. Then something else"
x_split <- strsplit(x, split = "[,.]")
[[1]]
[1] "They didn't sell the company" " and it went bankrupt" " Then something else"
然后将正则表达式应用于列表x_split
的每个元素。最后合并所有部分(如果需要)。
答案 2 :(得分:0)
这不是理想的,但可以完成工作:
x <- "They didn't sell the company, and it did not go bankrupt. That's it"
gsub("((^|[[:punct:]]).*?(not|n't)|[[:punct:]].*?((?<=\\s)[[:punct:]]|$))(*SKIP)(*FAIL)|\\s",
" not_", x,
perl = TRUE)
# [1] "They didn't not_sell not_the not_company, and it did not not_go not_bankrupt. That's it"
备注:强>
这使用(*SKIP)(*FAIL)
技巧来避免任何你不想要正则表达式匹配的模式。这基本上用not_
替换每个空格,除了它们之间的空间:
字符串或标点符号的开头"not"
或"n't"
或
标点符号和标点符号(后面没有空格)或字符串结尾