解释搜索算法的“timethese”结果

时间:2016-07-08 18:26:46

标签: perl search

我对以下结果感到有些困惑,我希望你们中的一些人能够阐明为什么线性搜索看起来比Perl中的二进制和插值更快。

Benchmark: timing 1000000 iterations of Binary, Interpolation, Linear...
    Binary: 17 wallclock secs (16.33 usr +  0.00 sys = 16.33 CPU) @ 61236.99/s (n=1000000)
Interpolation:  4 wallclock secs ( 3.65 usr +  0.00 sys =  3.65 CPU) @ 273972.60/s (n=1000000)
    Linear:  2 wallclock secs ( 1.52 usr +  0.00 sys =  1.52 CPU) @ 657894.74/s (n=1000000)

每个功能如下。我正在尝试编写一堆众所周知的算法,并在使用Perl 掌握算法中继续使用。

sub LinearSearch {
    # Search linearly for a value
    my $val = $_[0];
    my $arrptr = $_[1];

    for (my $i=0; $i<ARR_LENGTH; ++$i) {
        if ($arrptr->[$i] == $val) {
            return $i;
        }
    }

    return -1;
}


sub BinarySearch {
    my $val = $_[0];
    my $arrptr = $_[1];

    my $low = 0;
    my $high = ARR_LENGTH;  # to be modified

    while ($low <= $high) {
        my $middle = int(($low + $high) / 2);
        my $midValue = $arrptr->[$middle];

        if ($midValue < $val) {
            $low = $middle + 1;
        } elsif ($midValue > $val) {
            $high = $middle - 1;
        } else {
            return $middle;
        }
    }

    return -1;
}


sub InterpolationSearch {
    my $val = $_[0];
    my $arrptr = $_[1];

    my $low = 0;
    my $high = ARR_LENGTH;  # to be modified

    while ($val >= $arrptr->[$low] && $val <= $arrptr->[$high]) {
        # solve for the middle value again
        my $middle = int($low + ($high - $low)*(($val - @{$arrptr}[$low]) 
            / (@{$arrptr}[$high] - @{$arrptr}[$low] + 1)));

        my $middleVal = $arrptr->[$middle];

        if ($middleVal < $val) {
            $low = $middle + 1;
        } elsif ($middleVal > $val) {
            $high = $middle - 1;
        } else {
            return $middle;
        }
    }
    return -1;      # Not found
}

此外,ARR_LENGTH定义为

use constant ARR_LENGTH => 10_000;

一开始。奇怪的是二进制搜索需要这么长时间,然后插值不那么频繁,但仍然是线性搜索的两倍。

基准测试代码(就像我在网上找到的那样):

my @array = OrderedArray();
my $random_val = $array[int(rand(ARR_LENGTH))];
timethese(1_000_000, {
    Interpolation => 'InterpolationSearch($random_val, \@array)',
    Binary        => 'BinarySearch($random_val, \@array)',
    Linear        => 'LinearSearch($random_val, \@array)' }
);

其中OrderedArray()只是一个快速(可能是不必要的)函数

sub OrderedArray {
    # Create a random ordered array
    my @arr;

    for (my $i=1; $i<=ARR_LENGTH; ++$i) {
        push @arr, $i;
    }

    return @arr;
}

3 个答案:

答案 0 :(得分:4)

当您将字符串传递给时间而不是子参考时,您的@array和$ random_val变量不在Benchmark的范围内。因此它实际上并没有使用您指定的数据运行。

尝试将其作为:

运行
use Benchmark 'timethese';

use strict;
use warnings 'all';

use constant ARR_LENGTH => 10000;

my @array = OrderedArray();
my $random_val = $array[int(rand(ARR_LENGTH))];
timethese(
    -5,
    {
        'Interpolation' => sub { InterpolationSearch($random_val, \@array) },
        'Binary' => sub { BinarySearch($random_val, \@array) },
        'Linear' => sub { LinearSearch($random_val, \@array) },
    }
);

启用警告会在InterpolationSearch中显示错误。启用警告并将$ random_val设置为ARR_LENGTH +1会在BinarySearch中显示错误。在担心基准测试之前,您可以考虑编写一些测试用例并验证代码。

你可能更喜欢cmpthese来计时;我没有找到时间这些输出有用。

答案 1 :(得分:1)

为什么你的时间不符合预期的问题已得到解答,但我认为您可能希望看到更多 Perlish 实现三种搜索算法及其时间

请注意,Benchmark提供的所有功能都可以为其第一个参数取一个负数,表示运行每个基准测试的秒数。这通常是处理执行次数的更好方式,而不是猜测你是否需要10万或100万才能获得合适的样本

另请注意,我已将ARR_LENGTH设置为1百万

如您所料,线性搜索速度最慢,每秒15次搜索,然后是每秒117,018的二进制搜索,以及每秒481,320次的插值搜索

我希望这会有所帮助

use strict;
use warnings 'all';

use Benchmark 'timethese';

use constant ARR_LENGTH => 1_000_000;

STDOUT->autoflush;

my @array = ( 0 .. ARR_LENGTH-1 );

timethese(-10, {

    interpolation_search => sub {
        my $random_val = int rand @array;
        my $i = interpolation_search($random_val, \@array);
        die "Wrong result" unless $array[$i] == $random_val;
    },

    binary_search => sub {
        my $random_val = int rand @array;
        my $i = binary_search($random_val, \@array);
        die "Wrong result" unless $array[$i] == $random_val;
    },

    linear_search => sub {
        my $random_val = int rand @array;
        my $i = linear_search($random_val, \@array);
        die "Wrong result" unless $array[$i] == $random_val;
    },

} );


sub linear_search {

    my ($target, $array) = @_;

    for my $i ( 0 .. $#$array ) {

        return $i if $array->[$i] == $target;

        last if $array->[$i] > $target;
    }

    return;
}


sub binary_search {

    my ($target, $array) = @_;

    my $low = 0;
    my $high = $#$array;

    my ($mid, $mid_val);

    while ( $low <= $high ) {

        $mid = int(($low + $high) / 2);

        $mid_val = $array->[$mid];

        return $mid if $mid_val == $target;

        if ( $mid_val < $target ) {
            $low = $mid + 1;
        }
        else {
            $high = $mid - 1;
        }
    }

    return;
}


sub interpolation_search {

    my ($target, $array) = @_;

    my $low  = 0;
    my $high = $#$array;

    while () {

        my ($low_val, $high_val) = @{$array}[$low, $high];

        if ( $low_val == $high_val) {
            last unless $low_val == $target;
            return $low;
        }
        last if $target < $low_val or $target > $high_val;

        my $delta_i = $high     - $low;
        my $delta_v = $high_val - $low_val;

        my $mid = $low + ($target - $low_val) * $delta_i / $delta_v;
        my $mid_val = $array->[$mid];

        return $mid if $mid_val == $target;

        if ( $mid_val < $target ) {
            $low = $mid + 1;
        }
        else {
            $high = $mid - 1;
        }
    }

    return;
}

输出

Benchmark: running binary_search, interpolation_search, linear_search for at least 10 CPU seconds...
binary_search: 10 wallclock secs (10.53 usr +  0.00 sys = 10.53 CPU) @ 117018.33/s (n=1232320)
interpolation_search: 10 wallclock secs (10.39 usr +  0.00 sys = 10.39 CPU) @ 481320.50/s (n=5000920)
linear_search: 10 wallclock secs (10.05 usr +  0.00 sys = 10.05 CPU) @ 15.03/s (n=151)

答案 2 :(得分:-2)

use constant ARR_LENGTH=>10000;
for $i (0..ARR_LENGTH) {
   $arr[$i]=int(rand(15000));
}
@arr = sort {$a <=> $b}(@arr);

for $i (0..ARR_LENGTH) {
   $x=int(rand(15000));
   #LinearSearch(\@arr,$x); -- uncomment this or next
   #BinarySearch(\@arr,$x);
}

我通过取消注释一个或另一个来测试代码。结果是一致的:

Jay-$ vim some.pl ( uncommenting Binary Search)
Jay-$ time perl some.pl

real    0m0.096s
user    0m0.087s
sys     0m0.007s
Jay-$ time perl some.pl

real    0m0.091s
user    0m0.083s
sys     0m0.007s
Jay-$ time perl some.pl

real    0m0.100s
user    0m0.091s
sys     0m0.007s

Jay-$ vim some.pl ( uncommenting Linear Search)
Jay-$ time perl some.pl

real    0m23.163s
user    0m23.121s
sys     0m0.032s

Jay-$ time perl some.pl

real    0m22.482s
user    0m22.448s
sys     0m0.025s

结论:对于n = 10000,二进制搜索快200倍。匹配o(log n)。 如果你按照指定的方式运行了一百万次,那么技术上是线性搜索的50亿次迭代,这在你指定的2秒内是不可能的。你传递的随机数也是随机分布的吗?