为什么这两个RegEx基准测试差异如此之大?
他们使用相同的RegEx,一个就地,一个通过qr//
结果:
Rate rege1.FIND_AT_END rege2.FIND_AT_END rege1.FIND_AT_END 661157/s -- -85% rege2.FIND_AT_END 4384042/s 563% -- Rate rege1.NOFIND rege2.NOFIND rege1.NOFIND 678702/s -- -87% rege2.NOFIND 5117707/s 654% -- Rate rege1.FIND_AT_START rege2.FIND_AT_START rege1.FIND_AT_START 657765/s -- -85% rege2.FIND_AT_START 4268032/s 549% --
# Benchmark
use Benchmark qw(:all);
my $count = 10000000;
my $re = qr/abc/o;
my %tests = (
"NOFIND " => "cvxcvidgds.sdfpkisd[s"
,"FIND_AT_END " => "cvxcvidgds.sdfpabcd[s"
,"FIND_AT_START " => "abccvidgds.sdfpkisd[s"
);
foreach my $type (keys %tests) {
my $str = $tests{$type};
cmpthese($count, {
"rege1.$type" => sub { my $idx = ($str =~ $re); },
"rege2.$type" => sub { my $idx = ($str =~ /abc/o); }
});
}
答案 0 :(得分:3)
您正在处理本质上非常快的操作,因此您需要再运行一些测试来缩小速度的范围。我还将基准模型从外部(让cmpthese
执行)转换为内部(for
循环)速度放大。这可以最大限度地减少子例程调用的开销以及cmpthese
必须执行的任何工作。最后,测试以确定差异是否与量级成比例是重要的(在这种情况下它不会)。
use Benchmark 'cmpthese';
my $re = qr/abc/o;
my %tests = (
'fail ' => 'cvxcvidgds.sdfpkisd[s',
'end ' => 'cvxcvidgds.sdfpabcd[s',
'start' => 'abccvidgds.sdfpkisd[s',
);
for my $mag (map 10**$_, 1 .. 5) {
say "\n$mag:";
for my $type (keys %tests) {
my $str = $tests{$type};
cmpthese -1, {
'$re '.$type => sub {my $i; $i = ($str =~ $re ) for 0 .. $mag},
'/abc/o '.$type => sub {my $i; $i = ($str =~ /abc/o) for 0 .. $mag},
'/$re/ '.$type => sub {my $i; $i = ($str =~ /$re/ ) for 0 .. $mag},
'/$re/o '.$type => sub {my $i; $i = ($str =~ /$re/o) for 0 .. $mag},
}
}
}
10: Rate $re fail /$re/ fail /$re/o fail /abc/o fail $re fail 106390/s -- -8% -72% -74% /$re/ fail 115814/s 9% -- -70% -71% /$re/o fail 384635/s 262% 232% -- -5% /abc/o fail 403944/s 280% 249% 5% -- Rate $re end /$re/ end /$re/o end /abc/o end $re end 105527/s -- -5% -71% -72% /$re/ end 110902/s 5% -- -69% -71% /$re/o end 362544/s 244% 227% -- -5% /abc/o end 382242/s 262% 245% 5% -- Rate $re start /$re/ start /$re/o start /abc/o start $re start 111002/s -- -3% -72% -73% /$re/ start 114094/s 3% -- -71% -73% /$re/o start 390693/s 252% 242% -- -6% /abc/o start 417123/s 276% 266% 7% -- 100: Rate /$re/ fail $re fail /$re/o fail /abc/o fail /$re/ fail 12329/s -- -4% -77% -79% $re fail 12789/s 4% -- -76% -78% /$re/o fail 53194/s 331% 316% -- -9% /abc/o fail 58377/s 373% 356% 10% -- Rate $re end /$re/ end /$re/o end /abc/o end $re end 12440/s -- -1% -75% -77% /$re/ end 12623/s 1% -- -75% -77% /$re/o end 50127/s 303% 297% -- -7% /abc/o end 53941/s 334% 327% 8% -- Rate $re start /$re/ start /$re/o start /abc/o start $re start 12810/s -- -3% -76% -78% /$re/ start 13190/s 3% -- -75% -77% /$re/o start 52512/s 310% 298% -- -8% /abc/o start 57045/s 345% 332% 9% -- 1000: Rate $re fail /$re/ fail /$re/o fail /abc/o fail $re fail 1248/s -- -8% -76% -80% /$re/ fail 1354/s 9% -- -74% -79% /$re/o fail 5284/s 323% 290% -- -16% /abc/o fail 6311/s 406% 366% 19% -- Rate $re end /$re/ end /$re/o end /abc/o end $re end 1316/s -- -1% -74% -77% /$re/ end 1330/s 1% -- -74% -77% /$re/o end 5119/s 289% 285% -- -11% /abc/o end 5757/s 338% 333% 12% -- Rate /$re/ start $re start /$re/o start /abc/o start /$re/ start 1283/s -- -1% -75% -81% $re start 1302/s 1% -- -75% -80% /$re/o start 5119/s 299% 293% -- -22% /abc/o start 6595/s 414% 406% 29% -- 10000: Rate /$re/ fail $re fail /$re/o fail /abc/o fail /$re/ fail 130/s -- -6% -76% -80% $re fail 139/s 7% -- -74% -79% /$re/o fail 543/s 317% 291% -- -17% /abc/o fail 651/s 400% 368% 20% -- Rate /$re/ end $re end /$re/o end /abc/o end /$re/ end 128/s -- -3% -76% -79% $re end 132/s 3% -- -76% -78% /$re/o end 541/s 322% 311% -- -10% /abc/o end 598/s 366% 354% 11% -- Rate /$re/ start $re start /$re/o start /abc/o start /$re/ start 132/s -- -1% -77% -80% $re start 133/s 1% -- -76% -79% /$re/o start 566/s 330% 325% -- -13% /abc/o start 650/s 394% 388% 15% -- 100000: Rate /$re/ fail $re fail /$re/o fail /abc/o fail /$re/ fail 13.2/s -- -8% -76% -78% $re fail 14.2/s 8% -- -74% -76% /$re/o fail 55.9/s 325% 292% -- -8% /abc/o fail 60.5/s 360% 324% 8% -- Rate /$re/ end $re end /$re/o end /abc/o end /$re/ end 12.8/s -- -3% -75% -79% $re end 13.2/s 3% -- -75% -78% /$re/o end 52.3/s 308% 297% -- -12% /abc/o end 59.7/s 365% 353% 14% -- Rate $re start /$re/ start /$re/o start /abc/o start $re start 13.4/s -- -2% -77% -78% /$re/ start 13.6/s 2% -- -77% -78% /$re/o start 58.2/s 334% 328% -- -6% /abc/o start 62.2/s 364% 357% 7% --
您可以很容易地看到测试分为两类,一类是源/.../o
,另一类没有。由于这是一个合成差异,它为您提供线索,可能是编译器正在优化的情况(或者允许运行时以某种方式缓存)。 (在完成一次变量后删除对变量的检查,简化堆栈,很难说不看源)。
结果可能还取决于所使用的perl版本。上述测试在v5.10.1上运行
答案 1 :(得分:1)
首先,/ o会注意到,因为您没有插入到该模式中。
现在回答这个问题。
1/661157 s - 1/4384042 s = 0.000,001,3 s
1/678702 s - 1/5117707 s = 0.000,001,3 s
1/657765 s - 1/4268032 s = 0.000,001,3 s
所以=~ $re
需要额外的1.3微秒(或者在我的机器上为0.68)。 =~ $re
案例中有三个额外的Perl操作,并且占其中的一部分。不过,我不确定为什么会这样。一个是获取$re
,但我不知道其他两个人做了什么。
>perl -MO=Concise,-exec -e"$x =~ /abc/"
1 <0> enter
2 <;> nextstate(main 1 -e:1) v:{
3 <#> gvsv[*x] s
4 </> match(/"abc"/) vKS/RTIME
5 <@> leave[1 ref] vKP/REFC
-e syntax OK
>perl -MO=Concise,-exec -e"$x =~ $re"
1 <0> enter
2 <;> nextstate(main 1 -e:1) v:{
3 <#> gvsv[*x] s
4 <1> regcreset sK/1
5 <#> gvsv[*re] s
6 <|> regcomp(other->7) sK/1
7 </> match() vKS/RTIME
8 <@> leave[1 ref] vKP/REFC
-e syntax OK
1.3微秒似乎有点过分,但实际上并不是很大。