带子组的组的反向编号参考

时间:2018-09-24 22:45:40

标签: r regex pcre

在下面有代词动词组合时,我想用“狂热者”一词替换“狂热者”一词。

gsub(
    "(((s?he( i|')s)|((you|they|we)( a|')re)|(I( a|')m)).{1,20})(\\b[Ff]an)(s?\\b)", 
    '\\1\\2atic\\3', 
    'He\'s the bigest fan I know.', 
    perl = TRUE, ignore.case = TRUE
)

## [1] "He's the bigest He'saticHe's I know."

我知道编号的反向引用是指第一组的内部括号。有没有一种方法可以让它们仅参考三个组中的外三个括号:伪代码中的(stuff before fan)(fan)(s\\b)

我知道我的正则表达式可以替换wll组,因为我知道这是有效的。这只是回溯部分。

gsub(
    "(((s?he( i|')s)|((you|they|we)( a|')re)|(I( a|')m)).{1,20})(\\b[Ff]an)(s?\\b)", 
    '', 
    'He\'s the bigest fan I know.', 
    perl = TRUE, ignore.case = TRUE
)

## [1] " I know."

所需的输出:

## [1] "He's the bigest fanatic I know."

比赛示例

inputs <- c(
    "He's the bigest fan I know.",
    "I am a huge fan of his.",
    "I know she has lots of fans in his club",
    "I was cold and turned on the fan",
    "An air conditioner is better than 2 fans at cooling."
)


outputs <- c(
    "He's the bigest fanatic I know.",
    "I am a huge fanatic of his.",
    "I know she has lots of fanatics in his club",
    "I was cold and turned on the fan",
    "An air conditioner is better than 2 fans at cooling."
)

1 个答案:

答案 0 :(得分:4)

我知道您在捕获组过多方面遇到了麻烦。将您不感兴趣的内容变成non-capturing,或者删除那些多余的内容:

((?:s?he(?: i|')s|(?:you|they|we)(?: a|')re|I(?: a|')m).{1,20})\b(Fan)(s?)\b

请参见regex demo

请注意,由于您使用[Ff]参数,因此F可以转换为fignore.case=TRUE

R demo

gsub(
    "((?:s?he(?: i|')s|(?:you|they|we)(?: a|')re|I(?: a|')m).{1,20})\\b(fan)(s?)\\b", 
    '\\1\\2atic\\3', 
    inputs, 
    perl = TRUE, ignore.case = TRUE
)

输出:

[1] "He's the bigest fanatic I know."                     
[2] "I am a huge fanatic of his."                         
[3] "I know she has lots of fans in his club"             
[4] "I was cold and turned on the fan"                    
[5] "An air conditioner is better than 2 fans at cooling."