如何计算与给定系列具有给定相关性的随机值系列(B)(A)

时间:2012-05-14 22:37:30

标签: php math random correlation

对于一个教育网站,我的目的是让学生在价值系列和他们的收藏中愚弄。例如,学生可以输入两个计算相关性的数组:

$array_x = array(5,3,6,7,4,2,9,5);
$array_y = array(4,3,4,8,3,2,10,5);

echo Correlation($array_x, $array_y); // 0.93439982209434

这个代码完美无缺,可以在这篇文章的底部找到。然而,我现在面临挑战。我想要的是以下内容:

  • 学生输入$ array_x(5,3,6,7,4,2,9,5)
  • 学生输入相关性(0.9)
  • 学生输入$ array_y的边界(例如,1到10之间或50到80之间)
  • 脚本返回一个随机数组(例如:4,3,4,8,3,2,10,5),该数组具有(约)给定的相关性

因此,换句话说,代码必须像:

$array_x = array(5,3,6,7,4,2,9,5);
$boundaries = array(1, 10);
$correlation = 0.9;

echo ySeries($array_x, $boundaries, $correlation); // array(4,3,4,8,3,2,10,5)

在Stackexchange数学论坛上,@ lilya回答(作为图像插入,因为公式的Latex格式似乎不能在stackoverflow上工作):

enter image description here

P.S。用于计算相关性的代码:

function Correlation($arr1, $arr2) {        
  $correlation = 0;  
  $k = SumProductMeanDeviation($arr1, $arr2);
  $ssmd1 = SumSquareMeanDeviation($arr1);
  $ssmd2 = SumSquareMeanDeviation($arr2);
  $product = $ssmd1 * $ssmd2;
  $res = sqrt($product);
  $correlation = $k / $res;

  return $correlation;
}

function SumProductMeanDeviation($arr1, $arr2) {
  $sum = 0;
  $num = count($arr1);
  for($i=0; $i < $num; $i++) {
    $sum = $sum + ProductMeanDeviation($arr1, $arr2, $i);
  }
  return $sum;
}

function ProductMeanDeviation($arr1, $arr2, $item) {
  return (MeanDeviation($arr1, $item) * MeanDeviation($arr2, $item));
}

function SumSquareMeanDeviation($arr) {
  $sum = 0;
  $num = count($arr);
  for($i = 0; $i < $num; $i++) {
    $sum = $sum + SquareMeanDeviation($arr, $i);
  }
  return $sum;
}

function SquareMeanDeviation($arr, $item) {
  return MeanDeviation($arr, $item) * MeanDeviation($arr, $item);
}

function SumMeanDeviation($arr) {
  $sum = 0;
  $num = count($arr);
  for($i = 0; $i < $num; $i++) {
    $sum = $sum + MeanDeviation($arr, $i);
  }
  return $sum;
}

function MeanDeviation($arr, $item) {
  $average = Average($arr);
  return $arr[$item] - $average;
}    

function Average($arr) {
  $sum = Sum($arr);
  $num = count($arr);
  return $sum/$num;
}

function Sum($arr) {
  return array_sum($arr);
}

2 个答案:

答案 0 :(得分:4)

所以,这是你的算法的php实现,它使用Dawkins的黄鼠狼逐渐减少误差,直到达到预期的阈值。

<?php
function sqrMeanDeviation($array, $avg)
{
    $sqrMeanDeviation = 0;
    for($i=0; $i<count($array); $i++)
    {
        $dev = $array[$i] - $avg;
        $sqrMeanDeviation += $dev * $dev;
    }

    return $sqrMeanDeviation;
}

// z values are non-0 an can value between [-abs_z_bound, abs_z_bound]
function random_z_element($abs_z_bound = 1)
{
    $a = (mt_rand() % (2*$abs_z_bound) ) - ($abs_z_bound-1);
    if($a <= 0)
        $a--;
    return $a;
}

// change z a little
function copy_z_weasel($old_array_z, $error_probability = 20 /*error possible is 1 in error_probability*/, $abs_z_bound = 1)
{
    $new_z = array();

    for($i = 0; $i < count($old_array_z); $i++)
        if(mt_rand() % $error_probability == 0 )
            $new_z[$i] = random_z_element($abs_z_bound);
        else
            $new_z[$i] = $old_array_z[$i];

    return $new_z;
}

function correlation_error($array_y, $array_x, $avg_x, $sqrMeanDeviation_x, $correlation)
{
    // checking correlation
    $avg_y = array_sum($array_y)/count($array_y);

    $sqrMeanDeviation_y = 0;
    $covariance_xy = 0;

    for($i=0; $i<count($array_x); $i++)
    {
        $dev_y = $array_y[$i] - $avg_y;
        $sqrMeanDeviation_y += $dev_y * $dev_y;

        $dev_x = $array_x[$i] - $avg_x;
        $covariance_xy += $dev_y * $dev_x;
    }
    $correlation_xy = $covariance_xy/sqrt($sqrMeanDeviation_x*$sqrMeanDeviation_y);
    return abs($correlation_xy - $correlation);
}

function ySeries($array_x, $low_bound, $high_bound, $correlation, $threshold)
{
    $array_y = array();

    $avg_x = array_sum($array_x)/count($array_x);
    $sqrMeanDeviation_x = sqrMeanDeviation($array_x, $avg_x);

    // pre-compute beta
    $beta_x_sQMz = $sqrMeanDeviation_x * sqrt( 1 / ($correlation*$correlation) - 1 );

    $best_array_z = array();
    $n = 0;
    $error = $threshold + 1;

    while($error > $threshold)
    {
        ++$n;

        // generate z 
        $array_z = array();
        if(count($best_array_z) == 0)
            for($i=0; $i<count($array_x); $i++)
                $array_z[$i] = random_z_element();
        else
            $array_z = copy_z_weasel($best_array_z);

        $sqm_z = sqrMeanDeviation($array_z, array_sum($array_z)/count($array_z) );
        // this being 0 implies that for every beta correlation(x,y) = 1 so just give it any random beta
        if($sqm_z)
            $beta = $beta_x_sQMz / $sqm_z;
        else
            $beta = 10;
        // and now we have y
        for($i=0; $i<count($array_x); $i++)
            $array_y[$i] = $array_x[$i] + ($array_z[$i] * $beta);

        // now, change bounds (we could do this afterwards but we want precision and y to be integers)
        // rounding
        $min_y = $array_y[0];
        $max_y = $array_y[0];
        for( $i=1; $i<count($array_x); $i++ )
        {
            if($array_y[$i] < $min_y)
                $min_y = $array_y[$i];
            if($array_y[$i] > $max_y)
                $max_y = $array_y[$i];
        }

        $range = ($high_bound - $low_bound) / ($max_y - $min_y);
        $shift = $low_bound - $min_y;
        for( $i=0; $i<count($array_x); $i++ )
            $array_y[$i] = round($array_y[$i] * $range + $shift);

        // get the error
        $new_error = correlation_error($array_y, $array_x, $avg_x, $sqrMeanDeviation_x, $correlation);

        if($new_error < $error)
        {
            $best_array_z = $array_z;
            $error = $new_error;
        }

    }
    echo "Correlation ", $correlation, " approched within " , $new_error, " in ", $n ," iterations.\n";

    return $array_y;
}

?>

答案 1 :(得分:3)

一个简单的方法,虽然非常低效,但是从给定间隔中的随机数开始并尝试添加更多数字,只要它们不会过多地违反相关性:

function ySeries(array_x, boundaries, correlation) {
  array_y = [random(boundaries)]
  while (len(array_y) < len(array_x)) {
    do {    
      y = random(boundaries)
    } while (Correlation(array_x, array_y + [y]) > correlation + epsilon)

    array_y.push(y)
  }
}

可能效果很好,只要数字很小