我正在寻找一种快速计算数字列表中位数的方法,并且发现了这一点:
function array_median($array) {
// perhaps all non numeric values should filtered out of $array here?
$iCount = count($array);
if ($iCount == 0) {
return null;
}
// if we're down here it must mean $array
// has at least 1 item in the array.
$middle_index = floor($iCount / 2);
sort($array, SORT_NUMERIC);
$median = $array[$middle_index]; // assume an odd # of items
// Handle the even case by averaging the middle 2 items
if ($iCount % 2 == 0) {
$median = ($median + $array[$middle_index - 1]) / 2;
}
return $median;
}
这种使用sort()
的方法很有意义,当然是显而易见的方法。但是,如果中间堆更快,我很好奇。令人惊讶的是,当我实现一个简单的中间堆时,它始终比上述方法慢得多。
我简单的MedianHeap类:
class MedianHeap{
private $lowerHeap;
private $higherHeap;
private $numbers = [];
public function __construct($numbers = null)
{
$this->lowerHeap = new SplMaxHeap();
$this->higherHeap = new SplMinHeap();
if (count($numbers)) {
$this->insertArray($numbers);
}
}
public function insertArray ($numbers) {
foreach($numbers as $number) {
$this->insert($number);
}
}
public function insert($number)
{
$this->numbers[] = $number;
if ($this->lowerHeap->count() == 0 || $number < $this->lowerHeap->top()) {
$this->lowerHeap->insert($number);
} else {
$this->higherHeap->insert($number);
}
$this->balance();
}
protected function balance()
{
$biggerHeap = $this->lowerHeap->count() > $this->higherHeap->count() ? $this->lowerHeap : $this->higherHeap;
$smallerHeap = $this->lowerHeap->count() > $this->higherHeap->count() ? $this->higherHeap : $this->lowerHeap;
if ($biggerHeap->count() - $smallerHeap->count() >= 2) {
$smallerHeap->insert($biggerHeap->extract());
}
}
public function getMedian()
{
if (!count($this->numbers)) {
return null;
}
$biggerHeap = $this->lowerHeap->count() > $this->higherHeap->count() ? $this->lowerHeap : $this->higherHeap;
$smallerHeap = $this->lowerHeap->count() > $this->higherHeap->count() ? $this->higherHeap : $this->lowerHeap;
if ($biggerHeap->count() == $smallerHeap->count()) {
return ($biggerHeap->top() + $smallerHeap->top())/2;
} else {
return $biggerHeap->top();
}
}
}
然后是基准代码:
$array = [];
for($i=0; $i<100000; $i++) {
$array[] = mt_rand(1,100000) / mt_rand(1,10000);
}
$t = microtime(true);
echo array_median($array);
echo PHP_EOL . 'Sort Median: ' . (microtime(true) - $t) . ' seconds';
echo PHP_EOL;
$t = microtime(true);
$list = new MedianHeap($array);
echo $list->getMedian();
echo PHP_EOL . 'Heap Median: '. (microtime(true) - $t) . ' seconds';
在PHP中是否有某种东西会以某种方式使用堆低效率或者我的实现有什么问题?