为什么前瞻(有时)比捕获更快?

时间:2012-12-03 11:57:34

标签: regex perl lookahead

这个问题的灵感来自this other one

s/,(\d)/$1/s/,(?=\d)//进行比较:前者使用捕获组仅替换数字而不替换逗号,后者使用前瞻来确定逗号是否由数字继承。为什么后者有时会更快,如this answer中讨论的那样?

2 个答案:

答案 0 :(得分:4)

这两种方法做了不同的事情,并且有不同的开销成本。捕获时,perl必须复制捕获的文本。前瞻性比赛没有消耗;它必须标记它开始的位置。您可以使用re 'debug' pragma:

查看发生了什么
use re 'debug';
my $capture = qr/,(\d)/;
Compiling REx ",(\d)"
Final program:
   1: EXACT  (3)
   3: OPEN1 (5)
   5:   DIGIT (6)
   6: CLOSE1 (8)
   8: END (0)
anchored "," at 0 (checking anchored) minlen 2 
Freeing REx: ",(\d)"
use re 'debug';
my $lookahead = qr/,(?=\d)/;
Compiling REx ",(?=\d)"
Final program:
   1: EXACT  (3)
   3: IFMATCH[0] (8)
   5:   DIGIT (6)
   6:   SUCCEED (0)
   7: TAIL (8)
   8: END (0)
anchored "," at 0 (checking anchored) minlen 1 
Freeing REx: ",(?=\d)"

我希望在大多数情况下前瞻比捕获更快,但正如其他线程中所述,正则表达式的性能可能与数据有关。

答案 1 :(得分:-1)

与往常一样,当您想知道两段代码中的哪一段更快,您必须对其进行测试:

#!/usr/bin/perl

use 5.012;
use warnings;
use Benchmark qw<cmpthese>;

say "Extreme ,,,:";
my $Text = ',' x (my $LEN = 512);
cmpthese my $TIME = -10, my $CMP = {
    capture => \&capture,
    lookahead => \&lookahead,
};

say "\nExtreme ,0,0,0:";
$Text = ',0' x $LEN;
cmpthese $TIME, $CMP;

my $P = 0.01;
say "\nMixed (@{[$P * 100]}% zeros):";
my $zeros = $LEN * $P;
$Text = ',' x ($LEN - $zeros) . ',0' x $zeros;
cmpthese $TIME, $CMP;

sub capture {
    local $_ = $Text;
    s/,(\d)/$1/;
}

sub lookahead {
    local $_ = $Text;
    s/,(?=\d)//;
}

基准测试三种不同的情况:

  1. 仅','
  2. 仅',0'
  3. 1%',0',休息','
  4. 在我的机器上,使用我的perl版本,它会产生以下结果:

    Extreme ,,,:
                 Rate   capture lookahead
    capture   23157/s        --       -1%
    lookahead 23362/s        1%        --
    
    Extreme ,0,0,0:
                   Rate   capture lookahead
    capture    419476/s        --      -65%
    lookahead 1200465/s      186%        --
    
    Mixed (1% zeros):
                 Rate   capture lookahead
    capture   22013/s        --       -4%
    lookahead 22919/s        4%        --
    

    这些结果证实了这样的假设,即前瞻版本明显快于捕获,除了几乎只有逗号的情况。 PSIA已经在他的评论中解释过,这确实不是很令人惊讶。

    的问候, 的Matthias