为什么这两个RegEx基准测试差异如此之大?

时间:2011-10-01 14:12:17

标签: regex performance perl

为什么这两个RegEx基准测试差异如此之大? 他们使用相同的RegEx,一个就地,一个通过qr//

存储

结果:

                          Rate rege1.FIND_AT_END    rege2.FIND_AT_END
rege1.FIND_AT_END     661157/s                   --                 -85%
rege2.FIND_AT_END    4384042/s                 563%                   --
                          Rate rege1.NOFIND         rege2.NOFIND
rege1.NOFIND          678702/s                   --                 -87%
rege2.NOFIND         5117707/s                 654%                   --
                          Rate rege1.FIND_AT_START  rege2.FIND_AT_START
rege1.FIND_AT_START   657765/s                   --                 -85%
rege2.FIND_AT_START  4268032/s                 549%                   --

# Benchmark
use Benchmark qw(:all);

my $count = 10000000;
my $re = qr/abc/o;
my %tests = (
    "NOFIND        " => "cvxcvidgds.sdfpkisd[s"
   ,"FIND_AT_END   " => "cvxcvidgds.sdfpabcd[s"
   ,"FIND_AT_START " => "abccvidgds.sdfpkisd[s"
);

foreach my $type (keys %tests) {
    my $str = $tests{$type};
    cmpthese($count, {
        "rege1.$type" => sub { my $idx = ($str =~ $re); },
        "rege2.$type" => sub { my $idx = ($str =~ /abc/o); }
    });
}

2 个答案:

答案 0 :(得分:3)

您正在处理本质上非常快的操作,因此您需要再运行一些测试来缩小速度的范围。我还将基准模型从外部(让cmpthese执行)转换为内部(for循环)速度放大。这可以最大限度地减少子例程调用的开销以及cmpthese必须执行的任何工作。最后,测试以确定差异是否与量级成比例是重要的(在这种情况下它不会)。

use Benchmark 'cmpthese';

my $re = qr/abc/o;
my %tests = (
   'fail ' => 'cvxcvidgds.sdfpkisd[s',
   'end  ' => 'cvxcvidgds.sdfpabcd[s',
   'start' => 'abccvidgds.sdfpkisd[s',
);

for my $mag (map 10**$_, 1 .. 5) {
    say "\n$mag:";
    for my $type (keys %tests) {
        my $str = $tests{$type};
        cmpthese -1, {
            '$re    '.$type => sub {my $i; $i = ($str =~ $re   ) for 0 .. $mag},
            '/abc/o '.$type => sub {my $i; $i = ($str =~ /abc/o) for 0 .. $mag},
            '/$re/  '.$type => sub {my $i; $i = ($str =~ /$re/ ) for 0 .. $mag},
            '/$re/o '.$type => sub {my $i; $i = ($str =~ /$re/o) for 0 .. $mag},
        }
    }
}
10:
                 Rate $re    fail  /$re/  fail  /$re/o fail  /abc/o fail 
$re    fail  106390/s           --          -8%         -72%         -74%
/$re/  fail  115814/s           9%           --         -70%         -71%
/$re/o fail  384635/s         262%         232%           --          -5%
/abc/o fail  403944/s         280%         249%           5%           --
                 Rate $re    end   /$re/  end   /$re/o end   /abc/o end  
$re    end   105527/s           --          -5%         -71%         -72%
/$re/  end   110902/s           5%           --         -69%         -71%
/$re/o end   362544/s         244%         227%           --          -5%
/abc/o end   382242/s         262%         245%           5%           --
                 Rate $re    start /$re/  start /$re/o start /abc/o start
$re    start 111002/s           --          -3%         -72%         -73%
/$re/  start 114094/s           3%           --         -71%         -73%
/$re/o start 390693/s         252%         242%           --          -6%
/abc/o start 417123/s         276%         266%           7%           --

100:
                Rate /$re/  fail  $re    fail  /$re/o fail  /abc/o fail 
/$re/  fail  12329/s           --          -4%         -77%         -79%
$re    fail  12789/s           4%           --         -76%         -78%
/$re/o fail  53194/s         331%         316%           --          -9%
/abc/o fail  58377/s         373%         356%          10%           --
                Rate $re    end   /$re/  end   /$re/o end   /abc/o end  
$re    end   12440/s           --          -1%         -75%         -77%
/$re/  end   12623/s           1%           --         -75%         -77%
/$re/o end   50127/s         303%         297%           --          -7%
/abc/o end   53941/s         334%         327%           8%           --
                Rate $re    start /$re/  start /$re/o start /abc/o start
$re    start 12810/s           --          -3%         -76%         -78%
/$re/  start 13190/s           3%           --         -75%         -77%
/$re/o start 52512/s         310%         298%           --          -8%
/abc/o start 57045/s         345%         332%           9%           --

1000:
               Rate $re    fail  /$re/  fail  /$re/o fail  /abc/o fail 
$re    fail  1248/s           --          -8%         -76%         -80%
/$re/  fail  1354/s           9%           --         -74%         -79%
/$re/o fail  5284/s         323%         290%           --         -16%
/abc/o fail  6311/s         406%         366%          19%           --
               Rate $re    end   /$re/  end   /$re/o end   /abc/o end  
$re    end   1316/s           --          -1%         -74%         -77%
/$re/  end   1330/s           1%           --         -74%         -77%
/$re/o end   5119/s         289%         285%           --         -11%
/abc/o end   5757/s         338%         333%          12%           --
               Rate /$re/  start $re    start /$re/o start /abc/o start
/$re/  start 1283/s           --          -1%         -75%         -81%
$re    start 1302/s           1%           --         -75%         -80%
/$re/o start 5119/s         299%         293%           --         -22%
/abc/o start 6595/s         414%         406%          29%           --

10000:
              Rate /$re/  fail  $re    fail  /$re/o fail  /abc/o fail 
/$re/  fail  130/s           --          -6%         -76%         -80%
$re    fail  139/s           7%           --         -74%         -79%
/$re/o fail  543/s         317%         291%           --         -17%
/abc/o fail  651/s         400%         368%          20%           --
              Rate /$re/  end   $re    end   /$re/o end   /abc/o end  
/$re/  end   128/s           --          -3%         -76%         -79%
$re    end   132/s           3%           --         -76%         -78%
/$re/o end   541/s         322%         311%           --         -10%
/abc/o end   598/s         366%         354%          11%           --
              Rate /$re/  start $re    start /$re/o start /abc/o start
/$re/  start 132/s           --          -1%         -77%         -80%
$re    start 133/s           1%           --         -76%         -79%
/$re/o start 566/s         330%         325%           --         -13%
/abc/o start 650/s         394%         388%          15%           --

100000:
               Rate /$re/  fail  $re    fail  /$re/o fail  /abc/o fail 
/$re/  fail  13.2/s           --          -8%         -76%         -78%
$re    fail  14.2/s           8%           --         -74%         -76%
/$re/o fail  55.9/s         325%         292%           --          -8%
/abc/o fail  60.5/s         360%         324%           8%           --
               Rate /$re/  end   $re    end   /$re/o end   /abc/o end  
/$re/  end   12.8/s           --          -3%         -75%         -79%
$re    end   13.2/s           3%           --         -75%         -78%
/$re/o end   52.3/s         308%         297%           --         -12%
/abc/o end   59.7/s         365%         353%          14%           --
               Rate $re    start /$re/  start /$re/o start /abc/o start
$re    start 13.4/s           --          -2%         -77%         -78%
/$re/  start 13.6/s           2%           --         -77%         -78%
/$re/o start 58.2/s         334%         328%           --          -6%
/abc/o start 62.2/s         364%         357%           7%           --

您可以很容易地看到测试分为两类,一类是源/.../o,另一类没有。由于这是一个合成差异,它为您提供线索,可能是编译器正在优化的情况(或者允许运行时以某种方式缓存)。 (在完成一次变量后删除对变量的检查,简化堆栈,很难说不看源)。

结果可能还取决于所使用的perl版本。上述测试在v5.10.1上运行

答案 1 :(得分:1)

首先,/ o会注意到,因为您没有插入到该模式中。

现在回答这个问题。

1/661157 s - 1/4384042 s = 0.000,001,3 s
1/678702 s - 1/5117707 s = 0.000,001,3 s
1/657765 s - 1/4268032 s = 0.000,001,3 s

所以=~ $re需要额外的1.3微秒(或者在我的机器上为0.68)。 =~ $re案例中有三个额外的Perl操作,并且占其中的一部分。不过,我不确定为什么会这样。一个是获取$re,但我不知道其他两个人做了什么。

>perl -MO=Concise,-exec -e"$x =~ /abc/"
1  <0> enter
2  <;> nextstate(main 1 -e:1) v:{
3  <#> gvsv[*x] s
4  </> match(/"abc"/) vKS/RTIME
5  <@> leave[1 ref] vKP/REFC
-e syntax OK

>perl -MO=Concise,-exec -e"$x =~ $re"
1  <0> enter
2  <;> nextstate(main 1 -e:1) v:{
3  <#> gvsv[*x] s
4  <1> regcreset sK/1
5  <#> gvsv[*re] s
6  <|> regcomp(other->7) sK/1
7  </> match() vKS/RTIME
8  <@> leave[1 ref] vKP/REFC
-e syntax OK

1.3微秒似乎有点过分,但实际上并不是很大。