假设我有一个包含以下值的数组:
$values = array(48,30,97,61,34,40,51,33,1);
我希望这些值能够绘制如下的方框图:
$box_plot_values = array(
'lower_outlier' => 1,
'min' => 8,
'q1' => 32,
'median' => 40,
'q3' => 56,
'max' => 80,
'higher_outlier' => 97,
);
我将如何在PHP中执行此操作?
答案 0 :(得分:5)
function box_plot_values($array)
{
$return = array(
'lower_outlier' => 0,
'min' => 0,
'q1' => 0,
'median' => 0,
'q3' => 0,
'max' => 0,
'higher_outlier' => 0,
);
$array_count = count($array);
sort($array, SORT_NUMERIC);
$return['min'] = $array[0];
$return['lower_outlier'] = $return['min'];
$return['max'] = $array[$array_count - 1];
$return['higher_outlier'] = $return['max'];
$middle_index = floor($array_count / 2);
$return['median'] = $array[$middle_index]; // Assume an odd # of items
$lower_values = array();
$higher_values = array();
// If we have an even number of values, we need some special rules
if ($array_count % 2 == 0)
{
// Handle the even case by averaging the middle 2 items
$return['median'] = round(($return['median'] + $array[$middle_index - 1]) / 2);
foreach ($array as $idx => $value)
{
if ($idx < ($middle_index - 1)) $lower_values[] = $value; // We need to remove both of the values we used for the median from the lower values
elseif ($idx > $middle_index) $higher_values[] = $value;
}
}
else
{
foreach ($array as $idx => $value)
{
if ($idx < $middle_index) $lower_values[] = $value;
elseif ($idx > $middle_index) $higher_values[] = $value;
}
}
$lower_values_count = count($lower_values);
$lower_middle_index = floor($lower_values_count / 2);
$return['q1'] = $lower_values[$lower_middle_index];
if ($lower_values_count % 2 == 0)
$return['q1'] = round(($return['q1'] + $lower_values[$lower_middle_index - 1]) / 2);
$higher_values_count = count($higher_values);
$higher_middle_index = floor($higher_values_count / 2);
$return['q3'] = $higher_values[$higher_middle_index];
if ($higher_values_count % 2 == 0)
$return['q3'] = round(($return['q3'] + $higher_values[$higher_middle_index - 1]) / 2);
// Check if min and max should be capped
$iqr = $return['q3'] - $return['q1']; // Calculate the Inner Quartile Range (iqr)
if ($return['q1'] > $iqr) $return['min'] = $return['q1'] - $iqr;
if ($return['max'] - $return['q3'] > $iqr) $return['max'] = $return['q3'] + $iqr;
return $return;
}
答案 1 :(得分:1)
Lilleman的代码很精彩。我非常感谢他处理中位数和q1 / q3的方法。如果我先回答这个问题,我会用更难但不必要的方式应对奇数甚至数量的价值观。我的意思是如果4种不同的模式使用4次(计数(值),4)。但他的方式只是整洁。我真的很佩服他的工作。
我想对max,min,higher_outliers和lower_outliers进行一些改进。因为q1 - 1.5 * IQR只是下限,我们应该找到大于此范围的最小值作为“min”#。对于&#39; max&#39;这是相同的。此外,可能存在多个异常值。所以我想根据Lilleman的工作做一些改变。感谢。
function box_plot_values($array)
{
$return = array(
'lower_outlier' => 0,
'min' => 0,
'q1' => 0,
'median' => 0,
'q3' => 0,
'max' => 0,
'higher_outlier' => 0,
);
$array_count = count($array);
sort($array, SORT_NUMERIC);
$return['min'] = $array[0];
$return['lower_outlier'] = array();
$return['max'] = $array[$array_count - 1];
$return['higher_outlier'] = array();
$middle_index = floor($array_count / 2);
$return['median'] = $array[$middle_index]; // Assume an odd # of items
$lower_values = array();
$higher_values = array();
// If we have an even number of values, we need some special rules
if ($array_count % 2 == 0)
{
// Handle the even case by averaging the middle 2 items
$return['median'] = round(($return['median'] + $array[$middle_index - 1]) / 2);
foreach ($array as $idx => $value)
{
if ($idx < ($middle_index - 1)) $lower_values[] = $value; // We need to remove both of the values we used for the median from the lower values
elseif ($idx > $middle_index) $higher_values[] = $value;
}
}
else
{
foreach ($array as $idx => $value)
{
if ($idx < $middle_index) $lower_values[] = $value;
elseif ($idx > $middle_index) $higher_values[] = $value;
}
}
$lower_values_count = count($lower_values);
$lower_middle_index = floor($lower_values_count / 2);
$return['q1'] = $lower_values[$lower_middle_index];
if ($lower_values_count % 2 == 0)
$return['q1'] = round(($return['q1'] + $lower_values[$lower_middle_index - 1]) / 2);
$higher_values_count = count($higher_values);
$higher_middle_index = floor($higher_values_count / 2);
$return['q3'] = $higher_values[$higher_middle_index];
if ($higher_values_count % 2 == 0)
$return['q3'] = round(($return['q3'] + $higher_values[$higher_middle_index - 1]) / 2);
// Check if min and max should be capped
$iqr = $return['q3'] - $return['q1']; // Calculate the Inner Quartile Range (iqr)
$return['min'] = $return['q1'] - 1.5*$iqr; // This ( q1 - 1.5*IQR ) is actually the lower bound,
// We must compare every value in the lower half to this.
// Those less than the bound are outliers, whereas
// The least one that greater than this bound is the 'min'
// for the boxplot.
foreach( $lower_values as $idx => $value )
{
if( $value < $return['min'] ) // when values are less than the bound
{
$return['lower_outlier'][$idx] = $value ; // keep the index here seems unnecessary
// but those who are interested in which values are outliers
// can take advantage of this and asort to identify the outliers
}else
{
$return['min'] = $value; // when values that greater than the bound
break; // we should break the loop to keep the 'min' as the least that greater than the bound
}
}
$return['max'] = $return['q3'] + 1.5*$iqr; // This ( q3 + 1.5*IQR ) is the same as previous.
foreach( array_reverse($higher_values) as $idx => $value )
{
if( $value > $return['max'] )
{
$return['higher_outlier'][$idx] = $value ;
}else
{
$return['max'] = $value;
break;
}
}
return $return;
}
我希望这对那些对此问题感兴趣的人有所帮助。如果有更好的方法可以知道哪些值是异常值,请Pls给我添加评论。谢谢!
答案 2 :(得分:0)
我有一个不同的解决方案来计算较低和较高的胡须。与ShaoE的解决方案一样,它找到的最小值大于或等于下限(Q1 - 1.5 * IQR),反之亦然。
我使用array_filter
迭代数组,将值传递给回调函数并返回一个数组,其中只包含回调赋予true的值(参见php.net's array_filter manual)。在这种情况下,返回大于下限的值,并将其用作min
的输入,// get lower whisker
$whiskerMin = min(array_filter($array, function($value) use($quartile1, $iqr) {
return $value >= $quartile1 - 1.5 * $iqr;
} ));
// get higher whisker vice versa
$whiskerMax = max(array_filter($array, function($value) use($quartile3, $iqr) {
return $value <= $quartile3 + 1.5 * $iqr;
} ));
本身返回的值最小。
protected function _getStatusText
请注意,它忽略了异常值,我只测试了正值。