我的评分算法问题?

时间:2015-11-09 16:40:32

标签: php algorithm

我正在寻找的是根据投票数评分最高的俱乐部,显然是最高评级。

我所做的是以下内容:

  • 计算整个人口(所有选票和所有俱乐部)的差异
  • 确定每个俱乐部与人口差异的标准差
  • 根据向低票数的杂草俱乐部减去一个标准差来计算新的club weighted median

问题是我无法确定为什么我的数据显示不正确。我想我的计算存在问题。我得到10分的数字,并且当我应该从0-5接收值时为负数(评分也是如此)。

我不太确定我的逻辑失败了。

这是我的评分代码逻辑:

    $sql="SELECT SUM(rating) AS sumrating,COUNT(reviews.rating) AS countrating FROM reviews";   
    $rs=mysqli_fetch_array(mysqli_query($scx_dbh,$sql));    

    // get the total summation of ratings against all reviews
    $ratingssum=(int)$rs['sumrating'];

    // get the total number of ratings against all reviews
    $ratingscount=(int)$rs['countrating'];  

    // get the population mediam
    $mean = $ratingssum / $ratingscount;    

    // determine the variance of the population
    $variance = 0;
    $sql="SELECT rating AS score FROM reviews";
    $rs=mysqli_query($scx_dbh,$sql);
    while($row=mysqli_fetch_array($rs)){        
        $score = (int)$row['score'];        
        $variance += pow(($score-$mean),2);                     
    }
    $variance = $variance/$ratingscount;

    // loop through all clubs and implement new rating
    $scores=array();
    $sql="SELECT locid,COUNT(reviewid) AS locationrecordcount,AVG(rating) AS locationmedian FROM reviews GROUP BY locid";
    $rs=mysqli_query($scx_dbh,$sql);

    /// begin loop
    while($row=mysqli_fetch_array($rs)){

        // get the number of review votes for this club
        $numvotes=(int)$row['locationrecordcount'];

        // get the location id 
        $locId = (int)$row['locid'];

        // find the standard deviation for this club (total variance * numclubvotes)
        $standarddev=sqrt($variance*$numvotes);

        // create the new rating for this club with 1 standard deviation less
        $oldRating=$row['locationmedian'];          
        $newRating=$oldRating-$standarddev;
        $scores[$locId] = array(            
          'numvotes'=>$numvotes,
          'standard-deviation'=>$standarddev,
          'original-rating'=> $oldRating,
          'weighted-rating'=>$newRating
       );
    }

    usort($scores,function($a,$b){
        return $a['weighted-rating']-$b['weighted-rating'];
    });

以下是我的结果:

前10名

 [0] => Array
        (
            [numvotes] => 1121
            [standard-deviation] => 68.898321138853
            [original-rating] => 4.415700267618207
            [weighted-rating] => -64.482620871235
        )

    [1] => Array
        (
            [numvotes] => 909
            [standard-deviation] => 62.042283630954
            [original-rating] => 3.1290979097910174
            [weighted-rating] => -58.913185721163
        )

    [2] => Array
        (
            [numvotes] => 594
            [standard-deviation] => 50.153247058093
            [original-rating] => 4.414225589225589
            [weighted-rating] => -45.739021468868
        )

    [3] => Array
        (
            [numvotes] => 505
            [standard-deviation] => 46.243587892712
            [original-rating] => 4.090099009900985
            [weighted-rating] => -42.153488882811
        )

    [4] => Array
        (
            [numvotes] => 517
            [standard-deviation] => 46.78979093937
            [original-rating] => 4.661025145067699
            [weighted-rating] => -42.128765794302
        )

    [5] => Array
        (
            [numvotes] => 505
            [standard-deviation] => 46.243587892712
            [original-rating] => 3.2117821782178173
            [weighted-rating] => -43.031805714494
        )

    [6] => Array
        (
            [numvotes] => 398
            [standard-deviation] => 41.053233483774
            [original-rating] => 4.231155778894469
            [weighted-rating] => -36.822077704879
        )

    [7] => Array
        (
            [numvotes] => 340
            [standard-deviation] => 37.944190471069
            [original-rating] => 3.9102941176470547
            [weighted-rating] => -34.033896353422
        )

    [8] => Array
        (
            [numvotes] => 323
            [standard-deviation] => 36.983422110177
            [original-rating] => 3.261145510835913
            [weighted-rating] => -33.722276599341
        )

    [9] => Array
        (
            [numvotes] => 280
            [standard-deviation] => 34.433791770728
            [original-rating] => 3.36767857142857
            [weighted-rating] => -31.066113199299
        )

    [10] => Array
        (
            [numvotes] => 254
            [standard-deviation] => 32.796136967109
            [original-rating] => 3.1411417322834665
            [weighted-rating] => -29.654995234825
        )

最差10

[232] => Array
    (
        [numvotes] => 2
        [standard-deviation] => 2.9101865621466
        [original-rating] => 4.95
        [weighted-rating] => 2.0398134378534
    )

[233] => Array
    (
        [numvotes] => 2
        [standard-deviation] => 2.9101865621466
        [original-rating] => 5
        [weighted-rating] => 2.0898134378534
    )

[234] => Array
    (
        [numvotes] => 1
        [standard-deviation] => 2.0578126526118
        [original-rating] => 4
        [weighted-rating] => 1.9421873473882
    )

[235] => Array
    (
        [numvotes] => 2
        [standard-deviation] => 2.9101865621466
        [original-rating] => 4.8
        [weighted-rating] => 1.8898134378534
    )

[236] => Array
    (
        [numvotes] => 1
        [standard-deviation] => 2.0578126526118
        [original-rating] => 3.25
        [weighted-rating] => 1.1921873473882
    )

[237] => Array
    (
        [numvotes] => 1
        [standard-deviation] => 2.0578126526118
        [original-rating] => 5
        [weighted-rating] => 2.9421873473882
    )

[238] => Array
    (
        [numvotes] => 1
        [standard-deviation] => 2.0578126526118
        [original-rating] => 5
        [weighted-rating] => 2.9421873473882
    )

[239] => Array
    (
        [numvotes] => 1
        [standard-deviation] => 2.0578126526118
        [original-rating] => 4.1
        [weighted-rating] => 2.0421873473882
    )

[240] => Array
    (
        [numvotes] => 1
        [standard-deviation] => 2.0578126526118
        [original-rating] => 5
        [weighted-rating] => 2.9421873473882
    )

[241] => Array
    (
        [numvotes] => 2
        [standard-deviation] => 2.9101865621466
        [original-rating] => 5
        [weighted-rating] => 2.0898134378534
    )

更新

好的,所以我重新计算了standard deviation对整个人口。它是2.0578126526118

这是我目前的代码:

    $sql="SELECT SUM(reviews.rating) AS sumrating,COUNT(reviews.rating) AS countrating FROM reviews";   
    $rs=mysqli_fetch_array(mysqli_query($scx_dbh,$sql));    
    $ratingssum=(int)$rs['sumrating'];
    $ratingscount=(int)$rs['countrating'];  
    $mean = $ratingssum / $ratingscount;    
    $variance = 0;
    $sql="SELECT rating AS score FROM reviews";
    $rs=mysqli_query($scx_dbh,$sql);
    while($row=mysqli_fetch_array($rs)){        
        $score = (int)$row['score'];        
        $variance += pow(($score-$mean),2);                     
    }
    $variance = $variance/$ratingscount;
    $standarddev=sqrt($variance);
    $scores=array();
    $sql="SELECT locid,COUNT(reviewid) AS locationrecordcount,AVG(rating) AS locationmedian FROM reviews GROUP BY locid";
    $rs=mysqli_query($scx_dbh,$sql);
    while($row=mysqli_fetch_array($rs)){
        $numvotes=(int)$row['locationrecordcount'];
        $locId = (int)$row['locid'];        
        $oldRating=$row['locationmedian'];
        $newRating=$oldRating-$standarddev;
        $scores[$locId] = array(            
            'numvotes'=>$numvotes,
            'standard-deviation'=>$standarddev,
            'original-rating'=> $oldRating,
            'weighted-rating'=>$newRating
        );
    }   
    usort($scores,function($a,$b){
        return (int)($a['weighted-rating']-$b['weighted-rating']);
    });

1. /我认为我的排序功能不正确。使用我的排序功能排序后,这些是排名前5:

     [0] => Array
            (
                [numvotes] => 1
                [standard-deviation] => 2.0578126526118
                [original-rating] => 0.2
                [weighted-rating] => -1.8578126526118
            )

        [1] => Array
            (
                [numvotes] => 1
                [standard-deviation] => 2.0578126526118
                [original-rating] => 0.05
                [weighted-rating] => -2.0078126526118
            )

        [2] => Array
            (
                [numvotes] => 4
                [standard-deviation] => 2.0578126526118
                [original-rating] => 0.7625
                [weighted-rating] => -1.2953126526118
            )

        [3] => Array
            (
                [numvotes] => 1
                [standard-deviation] => 2.0578126526118
                [original-rating] => 0.1
                [weighted-rating] => -1.9578126526118
            )

        [4] => Array
            (
                [numvotes] => 1
                [standard-deviation] => 2.0578126526118
                [original-rating] => 0.4
                [weighted-rating] => -1.6578126526118
            )

正如您所看到的,除了它们产生负数之外,看起来位置1的weighted-average(索引0)是-1.85而位置2(索引1)是{{ 1}}。我想我的算法或我的代码-2.00中的排序函数存在问题。

此外,当他们有1票时,我正在获得1号位的俱乐部。这个算法的目的是将这些俱乐部除掉,这样我就可以专注于拥有1000票的俱乐部。

以下是底部5:

or else why are there negative numbers being sorted as first

同样的行为表现在底部5.我的位置5(索引241) [237] => Array ( [numvotes] => 29 [standard-deviation] => 2.0578126526118 [original-rating] => 4.112068965517241 [weighted-rating] => 2.0542563129054 ) [238] => Array ( [numvotes] => 5 [standard-deviation] => 2.0578126526118 [original-rating] => 3.8800000000000003 [weighted-rating] => 1.8221873473882 ) [239] => Array ( [numvotes] => 31 [standard-deviation] => 2.0578126526118 [original-rating] => 3.7499999999999996 [weighted-rating] => 1.6921873473882 ) [240] => Array ( [numvotes] => 1 [standard-deviation] => 2.0578126526118 [original-rating] => 5 [weighted-rating] => 2.9421873473882 ) [241] => Array ( [numvotes] => 1 [standard-deviation] => 2.0578126526118 [original-rating] => 4.45 [weighted-rating] => 2.3921873473882 ) weighted-average,位置2.39 weighted-average 4(索引240)

2 个答案:

答案 0 :(得分:0)

标准差由变异的平方根计算,而不是变异的平方根乘以人口(numvotes):

// find the standard deviation for this club (total variance)
$standarddev=sqrt($variance);

如果您想自己称量每个球杆,那么您需要计算每个球杆的变化(和标准偏差)。要做到这一点,你需要只为每个俱乐部的投票总和,而不是所有投票,然后计算变化和标准差。那么所有选票的变化和标准差似乎都是不必要的。

<强>更新

你想要完成的事情(用很少的票数淘汰俱乐部)不能用标准偏差(σ)来完成。

Concider以下内容:

  • 一个俱乐部有1票:[5]。然后σ为0. 5/1=5, (5-5)^2 / 1=0, sqrt(0)=0
  • 一个俱乐部有1票:[1]。 σ再次为0. 1/1=1, (1-1)^2 / 1=0, sqrt(0)=0
  • 一个俱乐部有2票:[5,5]。 σ再次为0. 10/2=5, ((5-5)^2 + (5-5)^2) / 2=0, sqrt(0)=0

现在你认为你可以用低σ淘汰俱乐部。

  • 一个俱乐部有2票:[1,5]。 σ现在是2.83。 6/2=3, ((1-3)^2 + (5-3)^2) / 2=8, sqrt(8)=2.83

正如你所看到的,这里没有任何内容可以说“嘿,这个俱乐部获得了很多选票”。 σ所说的唯一一件事就是投票的利差是多少。如果没有或有小的差异(变化),则σ将为0或小,反之亦然。

可以尝试的是查看球杆σ(Cσ)和总σ(Tσ)之间的差异。如果该值接近0(达到极限,假设为0.1),那么您就会知道该俱乐部中的类似变体与整个人口中的变化相似。但这仍然不能保证至少有x票数。 此计算类似于abs(Cσ - Tσ) < 0.1

关于排序功能:

usort要求返回的整数为-1,0或1才能正常运行。当你开始减去负数时,你会得到相当奇怪的结果。正确的排序功能应如下所示:

usort($scores, function cmp($a, $b)
{
  if ($a['weighted-rating'] == $b['weighted-rating']) {
    return 0;
  }
  return ($a['weighted-rating'] < $b['weighted-rating']) ? -1 : 1;
}

答案 1 :(得分:0)

$standarddev=sqrt($variance*$numvotes);

应该是

$standarddev=sqrt($variance);

修改

您的问题无法在逻辑中找到错误。原因是你有一个很大的复杂功能。您应该研究测试驱动的开发,并将您的代码分成小的,易于测试的工作单元。对于每个工作单元,您可以测试不同输入值的预期输出。这样,您可以更轻松地排除部分代码,例如stdCalculator,因为该部分由一系列测试用例覆盖。