在正则表达式替换中是否存在类似计数器变量的东西?

时间:2010-11-18 10:29:54

标签: regex language-agnostic

如果我有很多匹配项,例如在多行模式下,我想用匹配的一部分替换它们以及递增的计数器编号。

我想知道任何正则表达式的味道是否有这样的变量。我找不到一个,但我似乎记得那样存在......

我不是在谈论可以使用回调替换的脚本语言。这是关于能够在RegexBuddy,sublime text,gskinner.com/RegExr等工具中实现这一点,就像你可以用\ 1或$ 1引用捕获的子串一样。

2 个答案:

答案 0 :(得分:56)

FMTEYEWTK关于Fancy Regexes

好的,我要从简单到崇高。享受!

简单s /// e解决方案

鉴于此:

#!/usr/bin/perl

$_ = <<"End_of_G&S";
    This particularly rapid,
        unintelligible patter
    isn't generally heard,
        and if it is it doesn't matter!
End_of_G&S

my $count = 0;

然后这个:

s{
    \b ( [\w']+ ) \b
}{
    sprintf "(%s)[%d]", $1, ++$count;
}gsex;

产生这个

(This)[1] (particularly)[2] (rapid)[3],
    (unintelligible)[4] (patter)[5]
(isn't)[6] (generally)[7] (heard)[8], 
    (and)[9] (if)[10] (it)[11] (is)[12] (it)[13] (doesn't)[14] (matter)[15]!

Anon阵列解决方案中的插值代码

鉴于此:

s/\b([\w']+)\b/#@{[++$count]}=$1/g;

产生这个:

#1=This #2=particularly #3=rapid,
    #4=unintelligible #5=patter
#6=isn't #7=generally #8=heard, 
    #9=and #10=if #11=it #12=is #13=it #14=doesn't #15=matter!

使用LHS中的代码而不是RHS

的解决方案

这将增量放在匹配中:

s/ \b ( [\w']+ ) \b (?{ $count++ }) /#$count=$1/gx;

得出这个:

#1=This #2=particularly #3=rapid,
    #4=unintelligible #5=patter
#6=isn't #7=generally #8=heard, 
    #9=and #10=if #11=it #12=is #13=it #14=doesn't #15=matter!

口吃口吃液解决方案

这个

s{ \b ( [\w'] + ) \b             }
 { join " " => ($1) x ++$count   }gsex;

产生了这个令人愉快的答案:

This particularly particularly rapid rapid rapid,
    unintelligible unintelligible unintelligible unintelligible patter patter patter patter patter
isn't isn't isn't isn't isn't isn't generally generally generally generally generally generally generally heard heard heard heard heard heard heard heard, 
    and and and and and and and and and if if if if if if if if if if it it it it it it it it it it it is is is is is is is is is is is is it it it it it it it it it it it it it doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't matter matter matter matter matter matter matter matter matter matter matter matter matter matter matter!

探索边界

词汇边界有更强大的方法适用于复数所有者(之前的方法没有),但我怀疑你的神秘在于让++$count解雇,而不是{{1}的微妙之处行为。

真的希望人们明白\b不是他们认为的那样。 他们总是认为这意味着有空白区域或字符串的边缘 那里。他们从未将其视为\b\w\W过渡。

\W\w

如你所见,它取决于触摸的内容是有条件的。这就是# same as using a \b before: (?(?=\w) (?<!\w) | (?<!\W) ) # same as using a \b after: (?(?<=\w) (?!\w) | (?!\W) ) 条款的用途。

这会成为以下问题:

(?(COND)THEN|ELSE)

正确打印

$_ = qq('Tis Paul's parents' summer-house, isn't it?\n);
my $count = 0;

s{
    (?(?=[\-\w']) (?<![\-\w'])  | (?<![^\-\w']) )
    ( [\-\w'] + )
    (?(?<=[\-\w']) (?![\-\w'])  | (?![^\-\w'])  )
}{
    sprintf "(%s)[%d]", $1, ++$count
}gsex;

print;

担心Unicode

20世纪60年代风格的ASCII已经过时了50年。正如每当你看到有人写('Tis)[1] (Paul's)[2] (parents')[3] (summer-house)[4], (isn't)[5] (it)[6]? 时,它几乎总是错的,事实证明,像破折号和引号这样的东西也不应该在模式中显示为文字。虽然我们正在使用它,但您可能不想使用[a-z],因为它还包括数字和下划线,而不仅仅是字母。

想象一下这个字符串:

\w

您可以使用$_ = qq(\x{2019}Tis Ren\x{E9}e\x{2019}s great\x{2010}grandparents\x{2019} summer\x{2010}house, isn\x{2019}t it?\n); 作为文字:

use utf8

这次我会对模式有所不同,将我对术语的定义与执行分开,试图使其更具可读性,从而可以维护:

use utf8;
$_ = qq(’Tis Renée’s great‐grandparents’ summer‐house, isn’t it?\n);

运行时的代码产生了这个:

#!/usr/bin/perl -l
use 5.10.0;
use utf8;
use open qw< :std :utf8 >;
use strict;
use warnings qw< FATAL all >;
use autodie;

$_ = q(’Tis Renée’s great‐grandparents’ summer‐house, isn’t it?);

my $count = 0;

s{ (?<WORD> (?&full_word)  )

   # the rest is just definition
   (?(DEFINE)

     (?<word_char>   [\p{Alphabetic}\p{Quotation_Mark}] )

     (?<full_word>

             # next line won't compile cause
             # fears variable-width lookbehind
             ####  (?<! (?&word_char) )   )
             # so must inline it

         (?<! [\p{Alphabetic}\p{Quotation_Mark}] )

         (?&word_char)
         (?:
             \p{Dash}
           | (?&word_char)
         ) *

         (?!  (?&word_char) )
     )

   )   # end DEFINE declaration block

}{
    sprintf "(%s)[%d]", $+{WORD}, ++$count;
}gsex;

print;

好的,所以可能有一些关于花哨的正则表达的 FMTEYEWTK ,但是你不高兴问你吗? ☺

答案 1 :(得分:0)

据我所知,在普通的正则表达式中没有。

另一方面,有几个工具将其作为扩展提供,例如grepWin。在工具的帮助中(按 F1):

grepWin help regarding replacement placeholders

在内部,它使用 Boost's Perl Regular Expression 引擎,但 ${count}implemented within(与其他扩展一样)。