Question

我发现A clever trick to prealloc memory for a string，但是以下代码片段的表现比没有技巧的情况更差（通过使用_注释掉语句。

vec($str, 0x100000, 8)=0;

凭借巧妙的技巧，花了9.1秒。没有巧妙的技巧，花了7.8秒。

聪明的伎俩应该更快，因为它不需要制作这么多use Time::HiRes qw( gettimeofday ); my $big = "a" x 100; my $str = ""; vec($str, 0x100000, 8)=0; $ts = getTS(); for ($i=0; $i < 1000000; $i ++) { $str = ""; for ($j=0; $j<100; $j++) { $str .= $big; } } printf "took %f secs\n", getTS() - $ts; sub getTS { my ($seconds, $microseconds) = gettimeofday; return $seconds + (0.0+ $microseconds)/1000000.0; }。知道为什么吗？

Answer 1

我建议你应该避免使用聪明的技巧。 Perl对字符串内存的处理在十年内得到了极大的改进：它现在按比例将每个字符串预先扩展到其原始大小，并保留在程序重复相同行为时分配的任何内存

通过使用词法变量并避免使用C风格的for循环

，您可以从算法中再挤出10％的性能

另外，Time::HiRes已提供tv_interval来计算两次gettimeofday

的调用之间的差异

use strict;
use warnings 'all';

use Time::HiRes qw/ gettimeofday tv_interval /;

my $big = 'a' x 100;

my $start = [ gettimeofday ];

for my $i (1 .. 1_000_000 ) {

    my $str;

    for my $j ( 1 .. 100 ) {
        $str .= $big;
    }
}

my $end = [ gettimeofday ];

printf "took %.3f secs\n", tv_interval( $start, $end );

输出

took 8.324 secs

顺便提一下，在ARM处理器上运行Android 7.1.2的 Pixel C 平板电脑上运行的程序返回21.683秒。我认为这很不错。

Answer 2

Your test makes no sense. Your vec only has an effect when $i=0 —the first pass of the loop has the same affect as vec for the latter passes of the loop— so vec's pre-allocation only makes a difference for 1/1,000,000 of the time your program is executing! That means the 1.2s difference has noting to do with whether $str's string buffer is pre-allocated or not.

Did you just run each test once? That's not an appropriate way of doing a benchmark! If you run a proper test, you'll see that pre-allocating doesn't help —the gain is so minor it gets lost— but it doesn't hurt either; it simply has no effect.

                Rate  deoptimized     baseline preallocated
deoptimized  78084/s           --          -1%          -1%
baseline     78668/s           1%           --          -0%
preallocated 78928/s           1%           0%           --

Test:

use strict;
use warnings;

use Benchmark qw( cmpthese );

my $big = "a" x 100;

my $preallocated;
vec($preallocated, 0x100000, 8)=0;

cmpthese(-3, {
   deoptimized => sub {
      undef(my $str);
      $str .= $big for 1..100;
   },
   baseline => sub {
      my $str;
      $str .= $big for 1..100;
   },
   preallocated => sub {
      $preallocated = "";
      $preallocated .= $big for 1..100;
   },
});

I'm not saying pre-allocating never helps. There could be scenarios where it does —larger numbers?— just not here.

One of the reasons it has little effect is that Perl allocates exponentially more memory, which is to say the number of allocations increases only logarithmically as the loop sizes grow. The following shows only 21 reallocs for the 100 loop passes:

use strict;
use warnings;
use feature qw( say );

use B qw( svref_2object );

sub SvLEN(\$) { svref_2object($_[0])->LEN }

my $big = "a" x 100;

my $str = "";
my $incs = 0;
for (1..100) {
   my $len1 = SvLEN($str);
   $str .= $big;
   my $len2 = SvLEN($str);
   my $len_inc = $len2 - $len1;
   #say $len1, " ", $len_inc;
   ++$incs if $len_inc;
}

say $incs;  # 21

Answer 3

调用vec（）是一项额外的操作;你必须保存大量的realloc数据 - 移动以使其值得。我不确定为什么你的代码中有嵌套循环;任何必要的reallocs只能在内循环的第一次运行中完成，而不是稍后运行它。我的代码基准测试，调整为vec只分配你实际需要的缓冲区，显示vec版本稍慢：

use strict;
use warnings;
use Benchmark 'cmpthese';

cmpthese( 10, {
    'with_vec' => sub {
        my $big = "a" x 100;
        my $str;
        undef $str; # start with no string buffer for benchmarking purposes
        vec($str, 9999, 8)=0;
        for (my $i=0; $i < 1000000; $i ++) {
            $str = "";
            for (my $j=0; $j<100; $j++) {
                $str .= $big;
            }
        }
    },
    'without_vec' => sub {
        my $big = "a" x 100;
        my $str;
        undef $str; # start with no string buffer for benchmarking purposes
        vec($str, 9999, 8)=0;
        for (my $i=0; $i < 1000000; $i ++) {
            $str = "";
            for (my $j=0; $j<100; $j++) {
                $str .= $big;
            }
        }
    },
});

产：

            s/iter without_vec    with_vec
without_vec   8.43          --         -3%
with_vec      8.15          3%          --

（虽然偶尔with_vec更快）

（undef $str强制代码每次都使用一个新的字符串缓冲区;没有它，$str的缓冲区大小在Benchmark首次运行代码时会扩展到最大值，之后保持不变。）

这是一个经过调整的例子，其中预分配确实有所作为：

cmpthese( -10, {
    'with_vec' => sub {
        my $big = "a" x 1;
        my $str;
        undef $str;
        vec($str, 9999999, 8)=0;
        $str = "";
        for (my $j=0; $j<10000000; $j++) {
            $str .= $big;
        }
    },
    'without_vec' => sub {
        my $big = "a" x 1;
        my $str;
        undef $str;
        $str = "";
        for (my $j=0; $j<10000000; $j++) {
            $str .= $big;
        }
    },
});

产：

              Rate    with_vec without_vec
with_vec    1.29/s          --         -3%
without_vec 1.33/s          3%          --

（虽然结果不稳定;但是，no_vec的三分之一时间更快）。

Perl性能：为什么聪明的技巧表现更差？

3 个答案:

输出