我发现A clever trick to prealloc memory for a string,但是
以下代码片段的表现比没有技巧的情况更差(通过使用_
注释掉语句。
vec($str, 0x100000, 8)=0;
凭借巧妙的技巧,花了9.1秒。没有巧妙的技巧,花了7.8秒。
聪明的伎俩应该更快,因为它不需要制作这么多use Time::HiRes qw( gettimeofday );
my $big = "a" x 100;
my $str = "";
vec($str, 0x100000, 8)=0;
$ts = getTS();
for ($i=0; $i < 1000000; $i ++) {
$str = "";
for ($j=0; $j<100; $j++) {
$str .= $big;
}
}
printf "took %f secs\n", getTS() - $ts;
sub getTS {
my ($seconds, $microseconds) = gettimeofday;
return $seconds + (0.0+ $microseconds)/1000000.0;
}
。知道为什么吗?
答案 0 :(得分:5)
我建议你应该避免使用聪明的技巧。 Perl对字符串内存的处理在十年内得到了极大的改进:它现在按比例将每个字符串预先扩展到其原始大小,并保留在程序重复相同行为时分配的任何内存
通过使用词法变量并避免使用C风格的for
循环
另外,Time::HiRes
已提供tv_interval
来计算两次gettimeofday
use strict;
use warnings 'all';
use Time::HiRes qw/ gettimeofday tv_interval /;
my $big = 'a' x 100;
my $start = [ gettimeofday ];
for my $i (1 .. 1_000_000 ) {
my $str;
for my $j ( 1 .. 100 ) {
$str .= $big;
}
}
my $end = [ gettimeofday ];
printf "took %.3f secs\n", tv_interval( $start, $end );
took 8.324 secs
顺便提一下,在ARM处理器上运行Android 7.1.2的 Pixel C 平板电脑上运行的程序返回21.683秒。我认为这很不错。
答案 1 :(得分:2)
Your test makes no sense. Your vec
only has an effect when $i=0
—the first pass of the loop has the same affect as vec
for the latter passes of the loop— so vec
's pre-allocation only makes a difference for 1/1,000,000 of the time your program is executing! That means the 1.2s difference has noting to do with whether $str
's string buffer is pre-allocated or not.
Did you just run each test once? That's not an appropriate way of doing a benchmark! If you run a proper test, you'll see that pre-allocating doesn't help —the gain is so minor it gets lost— but it doesn't hurt either; it simply has no effect.
Rate deoptimized baseline preallocated
deoptimized 78084/s -- -1% -1%
baseline 78668/s 1% -- -0%
preallocated 78928/s 1% 0% --
Test:
use strict;
use warnings;
use Benchmark qw( cmpthese );
my $big = "a" x 100;
my $preallocated;
vec($preallocated, 0x100000, 8)=0;
cmpthese(-3, {
deoptimized => sub {
undef(my $str);
$str .= $big for 1..100;
},
baseline => sub {
my $str;
$str .= $big for 1..100;
},
preallocated => sub {
$preallocated = "";
$preallocated .= $big for 1..100;
},
});
I'm not saying pre-allocating never helps. There could be scenarios where it does —larger numbers?— just not here.
One of the reasons it has little effect is that Perl allocates exponentially more memory, which is to say the number of allocations increases only logarithmically as the loop sizes grow. The following shows only 21 reallocs for the 100 loop passes:
use strict;
use warnings;
use feature qw( say );
use B qw( svref_2object );
sub SvLEN(\$) { svref_2object($_[0])->LEN }
my $big = "a" x 100;
my $str = "";
my $incs = 0;
for (1..100) {
my $len1 = SvLEN($str);
$str .= $big;
my $len2 = SvLEN($str);
my $len_inc = $len2 - $len1;
#say $len1, " ", $len_inc;
++$incs if $len_inc;
}
say $incs; # 21
答案 2 :(得分:1)
调用vec()是一项额外的操作;你必须保存大量的realloc数据 - 移动以使其值得。我不确定为什么你的代码中有嵌套循环;任何必要的reallocs只能在内循环的第一次运行中完成,而不是稍后运行它。我的代码基准测试,调整为vec只分配你实际需要的缓冲区,显示vec版本稍慢:
use strict;
use warnings;
use Benchmark 'cmpthese';
cmpthese( 10, {
'with_vec' => sub {
my $big = "a" x 100;
my $str;
undef $str; # start with no string buffer for benchmarking purposes
vec($str, 9999, 8)=0;
for (my $i=0; $i < 1000000; $i ++) {
$str = "";
for (my $j=0; $j<100; $j++) {
$str .= $big;
}
}
},
'without_vec' => sub {
my $big = "a" x 100;
my $str;
undef $str; # start with no string buffer for benchmarking purposes
vec($str, 9999, 8)=0;
for (my $i=0; $i < 1000000; $i ++) {
$str = "";
for (my $j=0; $j<100; $j++) {
$str .= $big;
}
}
},
});
产:
s/iter without_vec with_vec
without_vec 8.43 -- -3%
with_vec 8.15 3% --
(虽然偶尔with_vec更快)
(undef $str
强制代码每次都使用一个新的字符串缓冲区;没有它,$str
的缓冲区大小在Benchmark首次运行代码时会扩展到最大值,之后保持不变。 )
这是一个经过调整的例子,其中预分配确实有所作为:
cmpthese( -10, {
'with_vec' => sub {
my $big = "a" x 1;
my $str;
undef $str;
vec($str, 9999999, 8)=0;
$str = "";
for (my $j=0; $j<10000000; $j++) {
$str .= $big;
}
},
'without_vec' => sub {
my $big = "a" x 1;
my $str;
undef $str;
$str = "";
for (my $j=0; $j<10000000; $j++) {
$str .= $big;
}
},
});
产:
Rate with_vec without_vec
with_vec 1.29/s -- -3%
without_vec 1.33/s 3% --
(虽然结果不稳定;但是,no_vec的三分之一时间更快)。