从`s / ^ / 1 /;`中删除`^`导致我的代码失败。为什么?

时间:2014-07-29 23:40:49

标签: regex perl

我一直在代码高尔夫球交易所工作this problem,这就是为什么我的代码看起来很有趣。

这是一个use strictuse warnings的程序,可以重现问题:

use strict;
use warnings;

$_ = "";

for my $i (1..33){
    s//1/;   # Just prepends 1 to the string $_
}
print "$_\n";

for my $i (34..127) {
    if( chr(y/1/1/) !~ /[!"'()*+,-.\/12357:;<=>?CEFGHIJKLMNSTUVWXYZ[\\\]^_`cfhijklmnrstuvwxyz{|}~]/ ) {
        print chr y/1/1/;
    }
    s/^/1/;   # Prepends 1 to the start of the string.
}

这是输出:

111111111111111111111111111111111
#$%&04689@ABDOPQRabdegopq

这正如我所料。但是,当我从第二个正则表达式中取出^时,正则表达式不再匹配并延长字符串。

use strict;
use warnings;

$_ = "";

for my $i (1..33){
    s//1/;
}
print "$_\n";

for my $i (34..127) {
    if( chr(y/1/1/) !~ /[!"'()*+,-.\/12357:;<=>?CEFGHIJKLMNSTUVWXYZ[\\\]^_`cfhijklmnrstuvwxyz{|}~]/ ) {
        print chr y/1/1/;
    }
    s//1/;   # No Longer matches!
}

为什么会这样? s//1/在第一个循环中起作用,那么为什么在第二个循环中更改它会破坏一切?

另外一点令人困惑的是,如果你把if块放在大括号中,那么正则表达式会再次匹配:

for my $i (34..127) {
    {
        if( chr(y/1/1/) !~ /[!"'()*+,-.\/12357:;<=>?CEFGHIJKLMNSTUVWXYZ[\\\]^_`cfhijklmnrstuvwxyz{|}~]/ ) {
            print chr y/1/1/;
        }
    }
    s//1/;   # This prepends 1 to the string $_ again.
}

编辑:

我想将原始代码编辑回问题以供参考:

use strict;
use warnings;
$_="";
until( y/1/1/ > 32){
    print "test1";
    s//1/;
    print "test";
}
print "$_\n";
until( y/1/1/ > 125+1 ) {
    if( chr(y/1/1/) !~ /[!"'()*+,-.\/12357:;<=>?CEFGHIJKLMNSTUVWXYZ[\\\]^_`cfhijklmnrstuvwxyz{|}~]/ ) {
        print chr y/1/1/;
    }

    s/^/1/; # this is the line we remove ^ from
}

当我们从该行中删除^时,输出会从:

更改
test1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1test111111111111111111111111111111111
#$%&04689@ABDOPQRabdegopq

  

悬挂没有输出

因此,在这种情况下,第二个循环中的行更改会改变它看起来的第一个行为。

3 个答案:

答案 0 :(得分:5)

s//1/;不会检查任何字符串或空字符串。它检查之前的最后一个成功的正则表达式文本。因此,第一个循环使用默认正则表达式,第二个循环使用上面if的最后一次成功检查。

引用:

  

如果PATTERN评估为空字符串,则最后成功   使用匹配的正则表达式。在这种情况下,只有g   和空模式上的c标志被尊重

请参阅The empty pattern //

答案 1 :(得分:2)

扩展VladimirM回答

print "regex have dynamic scope\n";
$_ = 1;
{
    m/1/;
    s//2/;
    print "$_  one becomes two, s//2/ is really s/1/2/\n";
}
$_=1;
{
    m/1/;
    {
        s//2/;
    }
    print "$_  one still becomes two, s//2/ is really s/1/2/\n";
}

$_=1;
{
    {
        m/1/;
    }
    s//2/;
    print "$_  one becomes twentyone, s//2/; is really s/(?:)//2;\n";
}

__END__
regex have dynamic scope
2  one becomes two, s//2/ is really s/1/2/
2  one still becomes two, s//2/ is really s/1/2/
21  one becomes twentyone, s//2/; is really s/(?:)//2;

由于正则表达式具有动态范围,因此使用 The empty pattern // 实际上意味着使用来自同一 dynamic scope 的先前模式,所以不要这样做:)

如果添加use re 'debug';,您可以看到正则表达式引擎使用上一个模式(关注Matching REx语句,NOTHING(2)为空而没有上一个,EXACT <1>(3)是之前的模式)

regex have dynamic scope
Guessing start of match in sv for REx "1" against "1"
Found anchored substr "1" at offset 0...
Guessed: match at offset 0
Guessing start of match in sv for REx "1" against "1"
Found anchored substr "1" at offset 0...
Guessed: match at offset 0
Matching REx "1" against "1"
   0 <> <1>                  |  1:EXACT <1>(3)
   1 <1> <>                  |  3:END(0)
Match successful!
2  one becomes two, s//2/ is really s/1/2/
Guessing start of match in sv for REx "1" against "1"
Found anchored substr "1" at offset 0...
Guessed: match at offset 0
Guessing start of match in sv for REx "1" against "1"
Found anchored substr "1" at offset 0...
Guessed: match at offset 0
Matching REx "1" against "1"
   0 <> <1>                  |  1:EXACT <1>(3)
   1 <1> <>                  |  3:END(0)
Match successful!
2  one still becomes two, s//2/ is really s/1/2/
Guessing start of match in sv for REx "1" against "1"
Found anchored substr "1" at offset 0...
Guessed: match at offset 0
Matching REx "" against "1"
   0 <> <1>                  |  1:NOTHING(2)
   0 <> <1>                  |  2:END(0)
Match successful!
21  one becomes twentyone, s//2/; is really s/(?:)//2;

更新,因为你有一个无限循环;最后一个模式总是有1个,所以替换基本上是s / 1/1 /;这意味着你的字符串不会增长,它总是33个字符...请参阅更新:)

$_="";
until( y/1/1/ > 32){
    print "test1";
    s//1/;
    print "test";
}
print "$_\n";
my $max = 126;
my $count = 0;
my $reps = 0;
until( y/1/1/ > 125+1 ) {
    if( chr(y/1/1/) !~ /[!"'()*+,-.\/12357:;<=>?CEFGHIJKLMNSTUVWXYZ[\\\]^_`cfhijklmnrstuvwxyz{|}~]/ ) {
        print chr y/1/1/;
    }
$reps =
#~     s/^/1/; # win
    s//1/; # fail
    $count++;
    last if $count > $max;
}
print "m $max c $count r $reps l @{[ length $_ ]}\n";
__END__
win #$%&04689@ABDOPQRabdegopqm 126 c 94 r 1 l 127
fail m 126 c 127 r 1 l 33

除非您对附加内容进行模糊处理,否则前缀为$_ .= 1;

答案 2 :(得分:1)

要在VladimirM answer上第二次扩展empty pattern //是问题所在,以下内容来自perldoc:

  
      
  • 空模式//

         

    如果PATTERN计算为空字符串,则使用最后成功匹配的正则表达式。在这种情况下,只有空模式上的gc标志才会被尊重;其他标志取自原始模式。如果之前没有匹配,则会(静默地)将其视为真正的空模式(始终匹配)。

  •   

基本上,如果在匹配的相同范围内存在另一个正则表达式,那么具有空模式的正则表达式的LHS实际上将是前一个正则表达式的LHS。

在下面以OP为灵感的示例中,我使用增量器的个位来扩展字符串。但是,一旦其他正则表达式与感叹号chr(33)匹配,空正则表达式的LHS将发生变化。然后它将开始匹配数字12357并将它们替换为增量器的位置。因此,字符串将保持相同的长度。

use strict;
use warnings;

$_ = "";

for my $i (1..127) {
    my $chr = chr(length);

    if( $chr =~ m'(?![#$%&])[[:punct:]12357CE-NS-Zcfh-nr-z]' ) {
        print "'$chr'";
    } else {
        print "   ";
    }

    s//$i % 10/e;

    printf "% 4d %s\n", $i, $_;
}

以下输出清楚地证明了这一点:

      1 1
      2 21
      3 321
      4 4321
      5 54321
      6 654321
      7 7654321
      8 87654321
      9 987654321
     10 0987654321
     11 10987654321
     12 210987654321
     13 3210987654321
     14 43210987654321
     15 543210987654321
     16 6543210987654321
     17 76543210987654321
     18 876543210987654321
     19 9876543210987654321
     20 09876543210987654321
     21 109876543210987654321
     22 2109876543210987654321
     23 32109876543210987654321
     24 432109876543210987654321
     25 5432109876543210987654321
     26 65432109876543210987654321
     27 765432109876543210987654321
     28 8765432109876543210987654321
     29 98765432109876543210987654321
     30 098765432109876543210987654321
     31 1098765432109876543210987654321
     32 21098765432109876543210987654321
     33 321098765432109876543210987654321
'!'  34 421098765432109876543210987654321
'!'  35 451098765432109876543210987654321
'!'  36 461098765432109876543210987654321
'!'  37 467098765432109876543210987654321
'!'  38 468098765432109876543210987654321
'!'  39 468098965432109876543210987654321
'!'  40 468098960432109876543210987654321
'!'  41 468098960412109876543210987654321
'!'  42 468098960422109876543210987654321
'!'  43 468098960432109876543210987654321
'!'  44 468098960442109876543210987654321
'!'  45 468098960445109876543210987654321
'!'  46 468098960446109876543210987654321
'!'  47 468098960446709876543210987654321
'!'  48 468098960446809876543210987654321
'!'  49 468098960446809896543210987654321
'!'  50 468098960446809896043210987654321
'!'  51 468098960446809896041210987654321
'!'  52 468098960446809896042210987654321
'!'  53 468098960446809896043210987654321
'!'  54 468098960446809896044210987654321
'!'  55 468098960446809896044510987654321
'!'  56 468098960446809896044610987654321
'!'  57 468098960446809896044670987654321
'!'  58 468098960446809896044680987654321
'!'  59 468098960446809896044680989654321
'!'  60 468098960446809896044680989604321
'!'  61 468098960446809896044680989604121
'!'  62 468098960446809896044680989604221
'!'  63 468098960446809896044680989604321
'!'  64 468098960446809896044680989604421
'!'  65 468098960446809896044680989604451
'!'  66 468098960446809896044680989604461
'!'  67 468098960446809896044680989604467
'!'  68 468098960446809896044680989604468
'!'  69 468098960446809896044680989604468
'!'  70 468098960446809896044680989604468
'!'  71 468098960446809896044680989604468
'!'  72 468098960446809896044680989604468
'!'  73 468098960446809896044680989604468
'!'  74 468098960446809896044680989604468
'!'  75 468098960446809896044680989604468
'!'  76 468098960446809896044680989604468
'!'  77 468098960446809896044680989604468
'!'  78 468098960446809896044680989604468
'!'  79 468098960446809896044680989604468
'!'  80 468098960446809896044680989604468
'!'  81 468098960446809896044680989604468
'!'  82 468098960446809896044680989604468
'!'  83 468098960446809896044680989604468
'!'  84 468098960446809896044680989604468
'!'  85 468098960446809896044680989604468
'!'  86 468098960446809896044680989604468
'!'  87 468098960446809896044680989604468
'!'  88 468098960446809896044680989604468
'!'  89 468098960446809896044680989604468
'!'  90 468098960446809896044680989604468
'!'  91 468098960446809896044680989604468
'!'  92 468098960446809896044680989604468
'!'  93 468098960446809896044680989604468
'!'  94 468098960446809896044680989604468
'!'  95 468098960446809896044680989604468
'!'  96 468098960446809896044680989604468
'!'  97 468098960446809896044680989604468
'!'  98 468098960446809896044680989604468
'!'  99 468098960446809896044680989604468
'!' 100 468098960446809896044680989604468
'!' 101 468098960446809896044680989604468
'!' 102 468098960446809896044680989604468
'!' 103 468098960446809896044680989604468
'!' 104 468098960446809896044680989604468
'!' 105 468098960446809896044680989604468
'!' 106 468098960446809896044680989604468
'!' 107 468098960446809896044680989604468
'!' 108 468098960446809896044680989604468
'!' 109 468098960446809896044680989604468
'!' 110 468098960446809896044680989604468
'!' 111 468098960446809896044680989604468
'!' 112 468098960446809896044680989604468
'!' 113 468098960446809896044680989604468
'!' 114 468098960446809896044680989604468
'!' 115 468098960446809896044680989604468
'!' 116 468098960446809896044680989604468
'!' 117 468098960446809896044680989604468
'!' 118 468098960446809896044680989604468
'!' 119 468098960446809896044680989604468
'!' 120 468098960446809896044680989604468
'!' 121 468098960446809896044680989604468
'!' 122 468098960446809896044680989604468
'!' 123 468098960446809896044680989604468
'!' 124 468098960446809896044680989604468
'!' 125 468098960446809896044680989604468
'!' 126 468098960446809896044680989604468
'!' 127 468098960446809896044680989604468