对于一个教育网站,我的目的是让学生在价值系列和他们的收藏中愚弄。例如,学生可以输入两个计算相关性的数组:
$array_x = array(5,3,6,7,4,2,9,5);
$array_y = array(4,3,4,8,3,2,10,5);
echo Correlation($array_x, $array_y); // 0.93439982209434
这个代码完美无缺,可以在这篇文章的底部找到。然而,我现在面临挑战。我想要的是以下内容:
因此,换句话说,代码必须像:
$array_x = array(5,3,6,7,4,2,9,5);
$boundaries = array(1, 10);
$correlation = 0.9;
echo ySeries($array_x, $boundaries, $correlation); // array(4,3,4,8,3,2,10,5)
在Stackexchange数学论坛上,@ lilya回答(作为图像插入,因为公式的Latex格式似乎不能在stackoverflow上工作):
P.S。用于计算相关性的代码:
function Correlation($arr1, $arr2) {
$correlation = 0;
$k = SumProductMeanDeviation($arr1, $arr2);
$ssmd1 = SumSquareMeanDeviation($arr1);
$ssmd2 = SumSquareMeanDeviation($arr2);
$product = $ssmd1 * $ssmd2;
$res = sqrt($product);
$correlation = $k / $res;
return $correlation;
}
function SumProductMeanDeviation($arr1, $arr2) {
$sum = 0;
$num = count($arr1);
for($i=0; $i < $num; $i++) {
$sum = $sum + ProductMeanDeviation($arr1, $arr2, $i);
}
return $sum;
}
function ProductMeanDeviation($arr1, $arr2, $item) {
return (MeanDeviation($arr1, $item) * MeanDeviation($arr2, $item));
}
function SumSquareMeanDeviation($arr) {
$sum = 0;
$num = count($arr);
for($i = 0; $i < $num; $i++) {
$sum = $sum + SquareMeanDeviation($arr, $i);
}
return $sum;
}
function SquareMeanDeviation($arr, $item) {
return MeanDeviation($arr, $item) * MeanDeviation($arr, $item);
}
function SumMeanDeviation($arr) {
$sum = 0;
$num = count($arr);
for($i = 0; $i < $num; $i++) {
$sum = $sum + MeanDeviation($arr, $i);
}
return $sum;
}
function MeanDeviation($arr, $item) {
$average = Average($arr);
return $arr[$item] - $average;
}
function Average($arr) {
$sum = Sum($arr);
$num = count($arr);
return $sum/$num;
}
function Sum($arr) {
return array_sum($arr);
}
答案 0 :(得分:4)
所以,这是你的算法的php实现,它使用Dawkins的黄鼠狼逐渐减少误差,直到达到预期的阈值。
<?php
function sqrMeanDeviation($array, $avg)
{
$sqrMeanDeviation = 0;
for($i=0; $i<count($array); $i++)
{
$dev = $array[$i] - $avg;
$sqrMeanDeviation += $dev * $dev;
}
return $sqrMeanDeviation;
}
// z values are non-0 an can value between [-abs_z_bound, abs_z_bound]
function random_z_element($abs_z_bound = 1)
{
$a = (mt_rand() % (2*$abs_z_bound) ) - ($abs_z_bound-1);
if($a <= 0)
$a--;
return $a;
}
// change z a little
function copy_z_weasel($old_array_z, $error_probability = 20 /*error possible is 1 in error_probability*/, $abs_z_bound = 1)
{
$new_z = array();
for($i = 0; $i < count($old_array_z); $i++)
if(mt_rand() % $error_probability == 0 )
$new_z[$i] = random_z_element($abs_z_bound);
else
$new_z[$i] = $old_array_z[$i];
return $new_z;
}
function correlation_error($array_y, $array_x, $avg_x, $sqrMeanDeviation_x, $correlation)
{
// checking correlation
$avg_y = array_sum($array_y)/count($array_y);
$sqrMeanDeviation_y = 0;
$covariance_xy = 0;
for($i=0; $i<count($array_x); $i++)
{
$dev_y = $array_y[$i] - $avg_y;
$sqrMeanDeviation_y += $dev_y * $dev_y;
$dev_x = $array_x[$i] - $avg_x;
$covariance_xy += $dev_y * $dev_x;
}
$correlation_xy = $covariance_xy/sqrt($sqrMeanDeviation_x*$sqrMeanDeviation_y);
return abs($correlation_xy - $correlation);
}
function ySeries($array_x, $low_bound, $high_bound, $correlation, $threshold)
{
$array_y = array();
$avg_x = array_sum($array_x)/count($array_x);
$sqrMeanDeviation_x = sqrMeanDeviation($array_x, $avg_x);
// pre-compute beta
$beta_x_sQMz = $sqrMeanDeviation_x * sqrt( 1 / ($correlation*$correlation) - 1 );
$best_array_z = array();
$n = 0;
$error = $threshold + 1;
while($error > $threshold)
{
++$n;
// generate z
$array_z = array();
if(count($best_array_z) == 0)
for($i=0; $i<count($array_x); $i++)
$array_z[$i] = random_z_element();
else
$array_z = copy_z_weasel($best_array_z);
$sqm_z = sqrMeanDeviation($array_z, array_sum($array_z)/count($array_z) );
// this being 0 implies that for every beta correlation(x,y) = 1 so just give it any random beta
if($sqm_z)
$beta = $beta_x_sQMz / $sqm_z;
else
$beta = 10;
// and now we have y
for($i=0; $i<count($array_x); $i++)
$array_y[$i] = $array_x[$i] + ($array_z[$i] * $beta);
// now, change bounds (we could do this afterwards but we want precision and y to be integers)
// rounding
$min_y = $array_y[0];
$max_y = $array_y[0];
for( $i=1; $i<count($array_x); $i++ )
{
if($array_y[$i] < $min_y)
$min_y = $array_y[$i];
if($array_y[$i] > $max_y)
$max_y = $array_y[$i];
}
$range = ($high_bound - $low_bound) / ($max_y - $min_y);
$shift = $low_bound - $min_y;
for( $i=0; $i<count($array_x); $i++ )
$array_y[$i] = round($array_y[$i] * $range + $shift);
// get the error
$new_error = correlation_error($array_y, $array_x, $avg_x, $sqrMeanDeviation_x, $correlation);
if($new_error < $error)
{
$best_array_z = $array_z;
$error = $new_error;
}
}
echo "Correlation ", $correlation, " approched within " , $new_error, " in ", $n ," iterations.\n";
return $array_y;
}
?>
答案 1 :(得分:3)
一个简单的方法,虽然非常低效,但是从给定间隔中的随机数开始并尝试添加更多数字,只要它们不会过多地违反相关性:
function ySeries(array_x, boundaries, correlation) {
array_y = [random(boundaries)]
while (len(array_y) < len(array_x)) {
do {
y = random(boundaries)
} while (Correlation(array_x, array_y + [y]) > correlation + epsilon)
array_y.push(y)
}
}
可能效果很好,只要数字很小