这个问题的灵感来自this other one。
将s/,(\d)/$1/
与s/,(?=\d)//
进行比较:前者使用捕获组仅替换数字而不替换逗号,后者使用前瞻来确定逗号是否由数字继承。为什么后者有时会更快,如this answer中讨论的那样?
答案 0 :(得分:4)
这两种方法做了不同的事情,并且有不同的开销成本。捕获时,perl必须复制捕获的文本。前瞻性比赛没有消耗;它必须标记它开始的位置。您可以使用re 'debug'
pragma:
use re 'debug';
my $capture = qr/,(\d)/;
Compiling REx ",(\d)" Final program: 1: EXACT (3) 3: OPEN1 (5) 5: DIGIT (6) 6: CLOSE1 (8) 8: END (0) anchored "," at 0 (checking anchored) minlen 2 Freeing REx: ",(\d)"
use re 'debug';
my $lookahead = qr/,(?=\d)/;
Compiling REx ",(?=\d)" Final program: 1: EXACT (3) 3: IFMATCH[0] (8) 5: DIGIT (6) 6: SUCCEED (0) 7: TAIL (8) 8: END (0) anchored "," at 0 (checking anchored) minlen 1 Freeing REx: ",(?=\d)"
我希望在大多数情况下前瞻比捕获更快,但正如其他线程中所述,正则表达式的性能可能与数据有关。
答案 1 :(得分:-1)
与往常一样,当您想知道两段代码中的哪一段更快,您必须对其进行测试:
#!/usr/bin/perl
use 5.012;
use warnings;
use Benchmark qw<cmpthese>;
say "Extreme ,,,:";
my $Text = ',' x (my $LEN = 512);
cmpthese my $TIME = -10, my $CMP = {
capture => \&capture,
lookahead => \&lookahead,
};
say "\nExtreme ,0,0,0:";
$Text = ',0' x $LEN;
cmpthese $TIME, $CMP;
my $P = 0.01;
say "\nMixed (@{[$P * 100]}% zeros):";
my $zeros = $LEN * $P;
$Text = ',' x ($LEN - $zeros) . ',0' x $zeros;
cmpthese $TIME, $CMP;
sub capture {
local $_ = $Text;
s/,(\d)/$1/;
}
sub lookahead {
local $_ = $Text;
s/,(?=\d)//;
}
基准测试三种不同的情况:
在我的机器上,使用我的perl版本,它会产生以下结果:
Extreme ,,,:
Rate capture lookahead
capture 23157/s -- -1%
lookahead 23362/s 1% --
Extreme ,0,0,0:
Rate capture lookahead
capture 419476/s -- -65%
lookahead 1200465/s 186% --
Mixed (1% zeros):
Rate capture lookahead
capture 22013/s -- -4%
lookahead 22919/s 4% --
这些结果证实了这样的假设,即前瞻版本明显快于捕获,除了几乎只有逗号的情况。 PSIA已经在他的评论中解释过,这确实不是很令人惊讶。
的问候, 的Matthias