我正在寻找的是根据投票数评分最高的俱乐部,显然是最高评级。
我所做的是以下内容:
club weighted median
问题是我无法确定为什么我的数据显示不正确。我想我的计算存在问题。我得到10分的数字,并且当我应该从0-5接收值时为负数(评分也是如此)。
我不太确定我的逻辑失败了。
这是我的评分代码逻辑:
$sql="SELECT SUM(rating) AS sumrating,COUNT(reviews.rating) AS countrating FROM reviews";
$rs=mysqli_fetch_array(mysqli_query($scx_dbh,$sql));
// get the total summation of ratings against all reviews
$ratingssum=(int)$rs['sumrating'];
// get the total number of ratings against all reviews
$ratingscount=(int)$rs['countrating'];
// get the population mediam
$mean = $ratingssum / $ratingscount;
// determine the variance of the population
$variance = 0;
$sql="SELECT rating AS score FROM reviews";
$rs=mysqli_query($scx_dbh,$sql);
while($row=mysqli_fetch_array($rs)){
$score = (int)$row['score'];
$variance += pow(($score-$mean),2);
}
$variance = $variance/$ratingscount;
// loop through all clubs and implement new rating
$scores=array();
$sql="SELECT locid,COUNT(reviewid) AS locationrecordcount,AVG(rating) AS locationmedian FROM reviews GROUP BY locid";
$rs=mysqli_query($scx_dbh,$sql);
/// begin loop
while($row=mysqli_fetch_array($rs)){
// get the number of review votes for this club
$numvotes=(int)$row['locationrecordcount'];
// get the location id
$locId = (int)$row['locid'];
// find the standard deviation for this club (total variance * numclubvotes)
$standarddev=sqrt($variance*$numvotes);
// create the new rating for this club with 1 standard deviation less
$oldRating=$row['locationmedian'];
$newRating=$oldRating-$standarddev;
$scores[$locId] = array(
'numvotes'=>$numvotes,
'standard-deviation'=>$standarddev,
'original-rating'=> $oldRating,
'weighted-rating'=>$newRating
);
}
usort($scores,function($a,$b){
return $a['weighted-rating']-$b['weighted-rating'];
});
以下是我的结果:
前10名
[0] => Array
(
[numvotes] => 1121
[standard-deviation] => 68.898321138853
[original-rating] => 4.415700267618207
[weighted-rating] => -64.482620871235
)
[1] => Array
(
[numvotes] => 909
[standard-deviation] => 62.042283630954
[original-rating] => 3.1290979097910174
[weighted-rating] => -58.913185721163
)
[2] => Array
(
[numvotes] => 594
[standard-deviation] => 50.153247058093
[original-rating] => 4.414225589225589
[weighted-rating] => -45.739021468868
)
[3] => Array
(
[numvotes] => 505
[standard-deviation] => 46.243587892712
[original-rating] => 4.090099009900985
[weighted-rating] => -42.153488882811
)
[4] => Array
(
[numvotes] => 517
[standard-deviation] => 46.78979093937
[original-rating] => 4.661025145067699
[weighted-rating] => -42.128765794302
)
[5] => Array
(
[numvotes] => 505
[standard-deviation] => 46.243587892712
[original-rating] => 3.2117821782178173
[weighted-rating] => -43.031805714494
)
[6] => Array
(
[numvotes] => 398
[standard-deviation] => 41.053233483774
[original-rating] => 4.231155778894469
[weighted-rating] => -36.822077704879
)
[7] => Array
(
[numvotes] => 340
[standard-deviation] => 37.944190471069
[original-rating] => 3.9102941176470547
[weighted-rating] => -34.033896353422
)
[8] => Array
(
[numvotes] => 323
[standard-deviation] => 36.983422110177
[original-rating] => 3.261145510835913
[weighted-rating] => -33.722276599341
)
[9] => Array
(
[numvotes] => 280
[standard-deviation] => 34.433791770728
[original-rating] => 3.36767857142857
[weighted-rating] => -31.066113199299
)
[10] => Array
(
[numvotes] => 254
[standard-deviation] => 32.796136967109
[original-rating] => 3.1411417322834665
[weighted-rating] => -29.654995234825
)
最差10
[232] => Array
(
[numvotes] => 2
[standard-deviation] => 2.9101865621466
[original-rating] => 4.95
[weighted-rating] => 2.0398134378534
)
[233] => Array
(
[numvotes] => 2
[standard-deviation] => 2.9101865621466
[original-rating] => 5
[weighted-rating] => 2.0898134378534
)
[234] => Array
(
[numvotes] => 1
[standard-deviation] => 2.0578126526118
[original-rating] => 4
[weighted-rating] => 1.9421873473882
)
[235] => Array
(
[numvotes] => 2
[standard-deviation] => 2.9101865621466
[original-rating] => 4.8
[weighted-rating] => 1.8898134378534
)
[236] => Array
(
[numvotes] => 1
[standard-deviation] => 2.0578126526118
[original-rating] => 3.25
[weighted-rating] => 1.1921873473882
)
[237] => Array
(
[numvotes] => 1
[standard-deviation] => 2.0578126526118
[original-rating] => 5
[weighted-rating] => 2.9421873473882
)
[238] => Array
(
[numvotes] => 1
[standard-deviation] => 2.0578126526118
[original-rating] => 5
[weighted-rating] => 2.9421873473882
)
[239] => Array
(
[numvotes] => 1
[standard-deviation] => 2.0578126526118
[original-rating] => 4.1
[weighted-rating] => 2.0421873473882
)
[240] => Array
(
[numvotes] => 1
[standard-deviation] => 2.0578126526118
[original-rating] => 5
[weighted-rating] => 2.9421873473882
)
[241] => Array
(
[numvotes] => 2
[standard-deviation] => 2.9101865621466
[original-rating] => 5
[weighted-rating] => 2.0898134378534
)
)
更新
好的,所以我重新计算了standard deviation
对整个人口。它是2.0578126526118
。
这是我目前的代码:
$sql="SELECT SUM(reviews.rating) AS sumrating,COUNT(reviews.rating) AS countrating FROM reviews";
$rs=mysqli_fetch_array(mysqli_query($scx_dbh,$sql));
$ratingssum=(int)$rs['sumrating'];
$ratingscount=(int)$rs['countrating'];
$mean = $ratingssum / $ratingscount;
$variance = 0;
$sql="SELECT rating AS score FROM reviews";
$rs=mysqli_query($scx_dbh,$sql);
while($row=mysqli_fetch_array($rs)){
$score = (int)$row['score'];
$variance += pow(($score-$mean),2);
}
$variance = $variance/$ratingscount;
$standarddev=sqrt($variance);
$scores=array();
$sql="SELECT locid,COUNT(reviewid) AS locationrecordcount,AVG(rating) AS locationmedian FROM reviews GROUP BY locid";
$rs=mysqli_query($scx_dbh,$sql);
while($row=mysqli_fetch_array($rs)){
$numvotes=(int)$row['locationrecordcount'];
$locId = (int)$row['locid'];
$oldRating=$row['locationmedian'];
$newRating=$oldRating-$standarddev;
$scores[$locId] = array(
'numvotes'=>$numvotes,
'standard-deviation'=>$standarddev,
'original-rating'=> $oldRating,
'weighted-rating'=>$newRating
);
}
usort($scores,function($a,$b){
return (int)($a['weighted-rating']-$b['weighted-rating']);
});
1. /我认为我的排序功能不正确。使用我的排序功能排序后,这些是排名前5:
[0] => Array
(
[numvotes] => 1
[standard-deviation] => 2.0578126526118
[original-rating] => 0.2
[weighted-rating] => -1.8578126526118
)
[1] => Array
(
[numvotes] => 1
[standard-deviation] => 2.0578126526118
[original-rating] => 0.05
[weighted-rating] => -2.0078126526118
)
[2] => Array
(
[numvotes] => 4
[standard-deviation] => 2.0578126526118
[original-rating] => 0.7625
[weighted-rating] => -1.2953126526118
)
[3] => Array
(
[numvotes] => 1
[standard-deviation] => 2.0578126526118
[original-rating] => 0.1
[weighted-rating] => -1.9578126526118
)
[4] => Array
(
[numvotes] => 1
[standard-deviation] => 2.0578126526118
[original-rating] => 0.4
[weighted-rating] => -1.6578126526118
)
正如您所看到的,除了它们产生负数之外,看起来位置1的weighted-average
(索引0)是-1.85
而位置2(索引1)是{{ 1}}。我想我的算法或我的代码-2.00
中的排序函数存在问题。
此外,当他们有1票时,我正在获得1号位的俱乐部。这个算法的目的是将这些俱乐部除掉,这样我就可以专注于拥有1000票的俱乐部。
以下是底部5:
or else why are there negative numbers being sorted as first
同样的行为表现在底部5.我的位置5(索引241) [237] => Array
(
[numvotes] => 29
[standard-deviation] => 2.0578126526118
[original-rating] => 4.112068965517241
[weighted-rating] => 2.0542563129054
)
[238] => Array
(
[numvotes] => 5
[standard-deviation] => 2.0578126526118
[original-rating] => 3.8800000000000003
[weighted-rating] => 1.8221873473882
)
[239] => Array
(
[numvotes] => 31
[standard-deviation] => 2.0578126526118
[original-rating] => 3.7499999999999996
[weighted-rating] => 1.6921873473882
)
[240] => Array
(
[numvotes] => 1
[standard-deviation] => 2.0578126526118
[original-rating] => 5
[weighted-rating] => 2.9421873473882
)
[241] => Array
(
[numvotes] => 1
[standard-deviation] => 2.0578126526118
[original-rating] => 4.45
[weighted-rating] => 2.3921873473882
)
为weighted-average
,位置2.39
weighted-average
4(索引240)
答案 0 :(得分:0)
标准差由变异的平方根计算,而不是变异的平方根乘以人口(numvotes):
// find the standard deviation for this club (total variance)
$standarddev=sqrt($variance);
如果您想自己称量每个球杆,那么您需要计算每个球杆的变化(和标准偏差)。要做到这一点,你需要只为每个俱乐部的投票总和,而不是所有投票,然后计算变化和标准差。那么所有选票的变化和标准差似乎都是不必要的。
<强>更新强>
你想要完成的事情(用很少的票数淘汰俱乐部)不能用标准偏差(σ)来完成。
Concider以下内容:
5/1=5, (5-5)^2 / 1=0, sqrt(0)=0
1/1=1, (1-1)^2 / 1=0, sqrt(0)=0
10/2=5, ((5-5)^2 + (5-5)^2) / 2=0, sqrt(0)=0
现在你认为你可以用低σ淘汰俱乐部。
6/2=3, ((1-3)^2 + (5-3)^2) / 2=8, sqrt(8)=2.83
正如你所看到的,这里没有任何内容可以说“嘿,这个俱乐部获得了很多选票”。 σ所说的唯一一件事就是投票的利差是多少。如果没有或有小的差异(变化),则σ将为0或小,反之亦然。
你可以尝试的是查看球杆σ(Cσ)和总σ(Tσ)之间的差异。如果该值接近0(达到极限,假设为0.1),那么您就会知道该俱乐部中的类似变体与整个人口中的变化相似。但这仍然不能保证至少有x票数。
此计算类似于abs(Cσ - Tσ) < 0.1
。
关于排序功能:
usort
要求返回的整数为-1,0或1才能正常运行。当你开始减去负数时,你会得到相当奇怪的结果。正确的排序功能应如下所示:
usort($scores, function cmp($a, $b)
{
if ($a['weighted-rating'] == $b['weighted-rating']) {
return 0;
}
return ($a['weighted-rating'] < $b['weighted-rating']) ? -1 : 1;
}
答案 1 :(得分:0)
$standarddev=sqrt($variance*$numvotes);
应该是
$standarddev=sqrt($variance);
修改强>
您的问题无法在逻辑中找到错误。原因是你有一个很大的复杂功能。您应该研究测试驱动的开发,并将您的代码分成小的,易于测试的工作单元。对于每个工作单元,您可以测试不同输入值的预期输出。这样,您可以更轻松地排除部分代码,例如stdCalculator,因为该部分由一系列测试用例覆盖。