我对以下结果感到有些困惑,我希望你们中的一些人能够阐明为什么线性搜索看起来比Perl中的二进制和插值更快。
Benchmark: timing 1000000 iterations of Binary, Interpolation, Linear...
Binary: 17 wallclock secs (16.33 usr + 0.00 sys = 16.33 CPU) @ 61236.99/s (n=1000000)
Interpolation: 4 wallclock secs ( 3.65 usr + 0.00 sys = 3.65 CPU) @ 273972.60/s (n=1000000)
Linear: 2 wallclock secs ( 1.52 usr + 0.00 sys = 1.52 CPU) @ 657894.74/s (n=1000000)
每个功能如下。我正在尝试编写一堆众所周知的算法,并在使用Perl 掌握算法中继续使用。
sub LinearSearch {
# Search linearly for a value
my $val = $_[0];
my $arrptr = $_[1];
for (my $i=0; $i<ARR_LENGTH; ++$i) {
if ($arrptr->[$i] == $val) {
return $i;
}
}
return -1;
}
sub BinarySearch {
my $val = $_[0];
my $arrptr = $_[1];
my $low = 0;
my $high = ARR_LENGTH; # to be modified
while ($low <= $high) {
my $middle = int(($low + $high) / 2);
my $midValue = $arrptr->[$middle];
if ($midValue < $val) {
$low = $middle + 1;
} elsif ($midValue > $val) {
$high = $middle - 1;
} else {
return $middle;
}
}
return -1;
}
sub InterpolationSearch {
my $val = $_[0];
my $arrptr = $_[1];
my $low = 0;
my $high = ARR_LENGTH; # to be modified
while ($val >= $arrptr->[$low] && $val <= $arrptr->[$high]) {
# solve for the middle value again
my $middle = int($low + ($high - $low)*(($val - @{$arrptr}[$low])
/ (@{$arrptr}[$high] - @{$arrptr}[$low] + 1)));
my $middleVal = $arrptr->[$middle];
if ($middleVal < $val) {
$low = $middle + 1;
} elsif ($middleVal > $val) {
$high = $middle - 1;
} else {
return $middle;
}
}
return -1; # Not found
}
此外,ARR_LENGTH
定义为
use constant ARR_LENGTH => 10_000;
一开始。奇怪的是二进制搜索需要这么长时间,然后插值不那么频繁,但仍然是线性搜索的两倍。
基准测试代码(就像我在网上找到的那样):
my @array = OrderedArray();
my $random_val = $array[int(rand(ARR_LENGTH))];
timethese(1_000_000, {
Interpolation => 'InterpolationSearch($random_val, \@array)',
Binary => 'BinarySearch($random_val, \@array)',
Linear => 'LinearSearch($random_val, \@array)' }
);
其中OrderedArray()
只是一个快速(可能是不必要的)函数
sub OrderedArray {
# Create a random ordered array
my @arr;
for (my $i=1; $i<=ARR_LENGTH; ++$i) {
push @arr, $i;
}
return @arr;
}
答案 0 :(得分:4)
当您将字符串传递给时间而不是子参考时,您的@array和$ random_val变量不在Benchmark的范围内。因此它实际上并没有使用您指定的数据运行。
尝试将其作为:
运行use Benchmark 'timethese';
use strict;
use warnings 'all';
use constant ARR_LENGTH => 10000;
my @array = OrderedArray();
my $random_val = $array[int(rand(ARR_LENGTH))];
timethese(
-5,
{
'Interpolation' => sub { InterpolationSearch($random_val, \@array) },
'Binary' => sub { BinarySearch($random_val, \@array) },
'Linear' => sub { LinearSearch($random_val, \@array) },
}
);
启用警告会在InterpolationSearch中显示错误。启用警告并将$ random_val设置为ARR_LENGTH +1会在BinarySearch中显示错误。在担心基准测试之前,您可以考虑编写一些测试用例并验证代码。
你可能更喜欢cmpthese来计时;我没有找到时间这些输出有用。
答案 1 :(得分:1)
为什么你的时间不符合预期的问题已得到解答,但我认为您可能希望看到更多 Perlish 实现三种搜索算法及其时间
请注意,Benchmark
提供的所有功能都可以为其第一个参数取一个负数,表示运行每个基准测试的秒数。这通常是处理执行次数的更好方式,而不是猜测你是否需要10万或100万才能获得合适的样本
另请注意,我已将ARR_LENGTH
设置为1百万
如您所料,线性搜索速度最慢,每秒15次搜索,然后是每秒117,018的二进制搜索,以及每秒481,320次的插值搜索
我希望这会有所帮助
use strict;
use warnings 'all';
use Benchmark 'timethese';
use constant ARR_LENGTH => 1_000_000;
STDOUT->autoflush;
my @array = ( 0 .. ARR_LENGTH-1 );
timethese(-10, {
interpolation_search => sub {
my $random_val = int rand @array;
my $i = interpolation_search($random_val, \@array);
die "Wrong result" unless $array[$i] == $random_val;
},
binary_search => sub {
my $random_val = int rand @array;
my $i = binary_search($random_val, \@array);
die "Wrong result" unless $array[$i] == $random_val;
},
linear_search => sub {
my $random_val = int rand @array;
my $i = linear_search($random_val, \@array);
die "Wrong result" unless $array[$i] == $random_val;
},
} );
sub linear_search {
my ($target, $array) = @_;
for my $i ( 0 .. $#$array ) {
return $i if $array->[$i] == $target;
last if $array->[$i] > $target;
}
return;
}
sub binary_search {
my ($target, $array) = @_;
my $low = 0;
my $high = $#$array;
my ($mid, $mid_val);
while ( $low <= $high ) {
$mid = int(($low + $high) / 2);
$mid_val = $array->[$mid];
return $mid if $mid_val == $target;
if ( $mid_val < $target ) {
$low = $mid + 1;
}
else {
$high = $mid - 1;
}
}
return;
}
sub interpolation_search {
my ($target, $array) = @_;
my $low = 0;
my $high = $#$array;
while () {
my ($low_val, $high_val) = @{$array}[$low, $high];
if ( $low_val == $high_val) {
last unless $low_val == $target;
return $low;
}
last if $target < $low_val or $target > $high_val;
my $delta_i = $high - $low;
my $delta_v = $high_val - $low_val;
my $mid = $low + ($target - $low_val) * $delta_i / $delta_v;
my $mid_val = $array->[$mid];
return $mid if $mid_val == $target;
if ( $mid_val < $target ) {
$low = $mid + 1;
}
else {
$high = $mid - 1;
}
}
return;
}
Benchmark: running binary_search, interpolation_search, linear_search for at least 10 CPU seconds...
binary_search: 10 wallclock secs (10.53 usr + 0.00 sys = 10.53 CPU) @ 117018.33/s (n=1232320)
interpolation_search: 10 wallclock secs (10.39 usr + 0.00 sys = 10.39 CPU) @ 481320.50/s (n=5000920)
linear_search: 10 wallclock secs (10.05 usr + 0.00 sys = 10.05 CPU) @ 15.03/s (n=151)
答案 2 :(得分:-2)
use constant ARR_LENGTH=>10000;
for $i (0..ARR_LENGTH) {
$arr[$i]=int(rand(15000));
}
@arr = sort {$a <=> $b}(@arr);
for $i (0..ARR_LENGTH) {
$x=int(rand(15000));
#LinearSearch(\@arr,$x); -- uncomment this or next
#BinarySearch(\@arr,$x);
}
我通过取消注释一个或另一个来测试代码。结果是一致的:
Jay-$ vim some.pl ( uncommenting Binary Search)
Jay-$ time perl some.pl
real 0m0.096s
user 0m0.087s
sys 0m0.007s
Jay-$ time perl some.pl
real 0m0.091s
user 0m0.083s
sys 0m0.007s
Jay-$ time perl some.pl
real 0m0.100s
user 0m0.091s
sys 0m0.007s
Jay-$ vim some.pl ( uncommenting Linear Search)
Jay-$ time perl some.pl
real 0m23.163s
user 0m23.121s
sys 0m0.032s
Jay-$ time perl some.pl
real 0m22.482s
user 0m22.448s
sys 0m0.025s
结论:对于n = 10000,二进制搜索快200倍。匹配o(log n)。 如果你按照指定的方式运行了一百万次,那么技术上是线性搜索的50亿次迭代,这在你指定的2秒内是不可能的。你传递的随机数也是随机分布的吗?