通过在下面的方法中找到成员中的子字符串,尝试找出过滤长数组的最快方法:
$str =~ /\.xml/
- 找到" .xml"字符串中的某个地方$str =~ /\.xml$/
- 找到" .xml"在字符串的末尾substr($str,-4) eq ".xml"
- 最后4个字符" .xml"?rindex($str, ".xml")
- 任何" .xml" (length($str) - rindex($str,".xml")) == 4
- 最后4个字符" .xml"?我使用while/if/push
和grep
内部使用下一个代码(更新了评论中的想法)尝试了以上所有内容
use 5.016;
use warnings;
use Benchmark qw(:all);
my $nmax = 5_000_000;
my @list = map { sprintf "a%s.%s", int(rand(100000000)), (int(rand(2))%2?"txt":"xml") } 1..$nmax;
cmpthese(10, {
'whl_match' => sub { my @xml; while(my ($i, $x) = each @list) { push(@xml, $x) if( $x =~ /\.xml/ )}; },
'whl_matchend' => sub { my @xml; while(my ($i, $x) = each @list) { push(@xml, $x) if( $x =~ /\.xml$/ )}; },
'whl_matchendz'=> sub { my @xml; while(my ($i, $x) = each @list) { push(@xml, $x) if( $x =~ /\.xml\z/ )}; },
'whl_substr' => sub { my @xml; while(my ($i, $x) = each @list) { push(@xml, $x) if( substr($x,-4) eq ".xml" )}; },
'whl_rindex' => sub { my @xml; while(my ($i, $x) = each @list) { push(@xml, $x) if( rindex($x,".xml") >= 0 )}; },
'whl_lenrindex'=> sub { my @xml; while(my ($i, $x) = each @list) { push(@xml, $x) if((length($x)-rindex($x,".xml"))==4)};},
'for_match' => sub { my @xml; for my $x (@list) { push(@xml, $x) if( $x =~ /\.xml/ )}; },
'for_matchend' => sub { my @xml; for my $x (@list) { push(@xml, $x) if( $x =~ /\.xml$/ )}; },
'for_matchendz'=> sub { my @xml; for my $x (@list) { push(@xml, $x) if( $x =~ /\.xml\z/ )}; },
'for_substr' => sub { my @xml; for my $x (@list) { push(@xml, $x) if( substr($x,-4) eq ".xml" )}; },
'for_rindex' => sub { my @xml; for my $x (@list) { push(@xml, $x) if( rindex($x,".xml") >= 0 )}; },
'for_lenrindex'=> sub { my @xml; for my $x (@list) { push(@xml, $x) if((length($x)-rindex($x,".xml"))==4)};},
'grp_match' => sub { my @xml = grep { /\.xml/ } @list; },
'grp_matchend' => sub { my @xml = grep { /\.xml$/ } @list; },
'grp_matchendz'=> sub { my @xml = grep { /\.xml\z/ } @list; },
'grp_substr' => sub { my @xml = grep { substr($_,-4) eq ".xml" } @list; },
'grp_rindex' => sub { my @xml = grep { rindex($_,".xml") >= 0 } @list; },
'grp_lenrindex'=> sub { my @xml = grep { (length($_) - rindex($_,".xml")) == 4 } @list; },
});
我的鲤鱼笔记本上的结果。
s/iter whl_matchend whl_matchendz grp_matchendz grp_matchend whl_lenrindex whl_match whl_substr grp_match whl_rindex for_matchend for_matchendz for_lenrindex for_match grp_lenrindex for_substr for_rindex grp_substr grp_rindex
whl_matchend 4.48 -- -0% -10% -12% -17% -21% -24% -25% -32% -47% -47% -67% -70% -70% -73% -73% -77% -78%
whl_matchendz 4.46 0% -- -9% -11% -17% -21% -23% -25% -32% -46% -46% -67% -69% -70% -73% -73% -76% -78%
grp_matchendz 4.05 11% 10% -- -2% -9% -13% -15% -17% -25% -41% -41% -63% -66% -67% -70% -70% -74% -76%
grp_matchend 3.96 13% 13% 2% -- -6% -11% -14% -15% -24% -40% -40% -62% -66% -66% -70% -70% -73% -75%
whl_lenrindex 3.70 21% 21% 9% 7% -- -5% -8% -9% -18% -35% -35% -60% -63% -64% -67% -67% -72% -73%
whl_match 3.53 27% 27% 15% 12% 5% -- -3% -5% -14% -32% -32% -58% -61% -62% -66% -66% -70% -72%
whl_substr 3.42 31% 30% 18% 16% 8% 3% -- -2% -12% -30% -30% -57% -60% -61% -65% -65% -69% -71%
grp_match 3.36 33% 33% 20% 18% 10% 5% 2% -- -10% -29% -29% -56% -59% -60% -64% -64% -69% -71%
whl_rindex 3.02 48% 48% 34% 31% 22% 17% 13% 11% -- -21% -21% -51% -55% -56% -60% -60% -65% -67%
for_matchend 2.40 87% 86% 69% 65% 55% 47% 43% 40% 26% -- -0% -38% -43% -44% -50% -50% -56% -59%
for_matchendz 2.39 87% 87% 69% 65% 55% 47% 43% 40% 26% 0% -- -38% -43% -44% -50% -50% -56% -59%
for_lenrindex 1.49 201% 200% 172% 166% 149% 137% 130% 126% 103% 61% 61% -- -8% -10% -19% -19% -29% -33%
for_match 1.36 229% 227% 197% 191% 172% 159% 151% 146% 122% 76% 76% 9% -- -2% -11% -12% -23% -27%
grp_lenrindex 1.33 237% 236% 204% 198% 178% 165% 157% 153% 127% 80% 80% 12% 2% -- -9% -9% -21% -26%
for_substr 1.21 271% 270% 235% 228% 207% 192% 184% 178% 150% 98% 98% 23% 13% 10% -- -0% -13% -18%
for_rindex 1.20 272% 271% 236% 229% 208% 193% 184% 179% 151% 99% 99% 23% 13% 10% 0% -- -13% -18%
grp_substr 1.05 326% 325% 285% 277% 252% 235% 226% 220% 188% 128% 128% 41% 30% 27% 15% 15% -- -6%
grp_rindex 0.990 352% 351% 309% 300% 274% 256% 246% 239% 205% 142% 142% 50% 38% 34% 22% 22% 6% --
我多次重复测试,总是得到上述顺序。
正如我所预料的那样,grep
的速度与while/if/push
相似,但下一个让我感到惊讶:
比较
s/iter
whl_matchend 4.54
grp_matchend 3.98
grep
只有略快与类似的while/if/push
一样。
为什么例如在下一个:
whl_substr 3.23
grp_substr 1.05
grep
快3倍 while/if/push
。那么,grep
在while/if/push
执行substr
的速度快{3}},而/regex-match/
执行grep {/regex/}
的速度快{3}}同样,这可以看作任何"字符串操作"。
换句话说, while/if/push
只有轻微的速度增加 grep {substr}
,但 $str =~ /\.xml/
的巨大速度提升。的为什么
另一个惊喜(至少对我而言)是下一个:为什么$str =~ /\.xml$/
比$
更快?我期望,而不是指定use 5.016;
use warnings;
use Benchmark qw(:all);
my $str = "a38877283.xml";
cmpthese(10, {
'match' => sub { $str =~ /\.xml/ for (1..5_000_000) },
'matchend' => sub { $str =~ /\.xml$/ for (1..5_000_000) },
'matchendz' => sub { $str =~ /\.xml\z/ for (1..5_000_000) }, #updated the \z
});
将加速rexex,因为不需要在整个字符串中搜索 - 但这是一个错误的假设,正如下一个测试的那样:
perl 5, version 20, subversion 0 (v5.20.0) built for darwin-2level
代表 s/iter matchend matchendz match
matchend 2.32 -- -1% -64%
matchendz 2.30 1% -- -63%
match 0.844 175% 173% --
(perlbrew)
perl 5, version 16, subversion 2 (v5.16.2) built for darwin-thread-multi-2level
使用: Rate matchendz match matchend
matchendz 0.405/s -- -69% -70%
match 1.29/s 218% -- -5%
matchend 1.36/s 235% 5% --
(默认OS X)
Darwin jabko.local 13.3.0 Darwin Kernel Version 13.3.0: Tue Jun 3 21:27:35 PDT 2014; root:xnu-2422.110.17~1/RELEASE_X86_64 x86_64
旧 perl更快。 ;)
操作系统:
qr
最后一个问题
答案 0 :(得分:2)
关于上一个问题,请尝试\z
而不是$
,因为\z
匹配字符串的结尾,而$
也会查找可选的尾随换行符(perldoc perlre
)。
use Benchmark qw(:all);
my $str = "a38877283.xml";
cmpthese(10, {
'match' => sub { $str =~ /\.xml/ for (1..5_000_000) },
'matchend' => sub { $str =~ /\.xml$/ for (1..5_000_000) },
'matchend2' => sub { $str =~ /\.xml\z/ for (1..5_000_000) },
});
输出
Rate matchend2 matchend match
matchend2 0.473/s -- -58% -59%
matchend 1.14/s 140% -- -1%
match 1.15/s 143% 1% --