我有一系列数字。计算数据集的中位数,模式和标准开发的最简单方法是什么?
答案 0 :(得分:27)
答案 1 :(得分:13)
根据您需要的深度,erickb的答案可能有效。但是对于Perl中的数字功能,有PDL。您可以使用pdl
函数创建一个小提琴(包含您的数据的对象)。从那里,您可以使用this page上的操作来执行您需要的统计信息。
编辑:环顾四周我找到了两个完全符合你需要的函数调用。 statsover
提供有关小提琴的一个维度的统计信息,而stats
在整个小提琴中做同样的事情。
my $piddle = pdl @data;
my ($mean,$prms,$median,$min,$max,$adev,$rms) = statsover $piddle;
答案 2 :(得分:13)
#!/usr/bin/perl
#
# stdev - figure N, min, max, median, mode, mean, & std deviation
#
# pull out all the real numbers in the input
# stream and run standard calculations on them.
# they may be intermixed with other test, need
# not be on the same or different lines, and
# can be in scientific notion (avagadro=6.02e23).
# they also admit a leading + or -.
#
# Tom Christiansen
# tchrist@perl.com
use strict;
use warnings;
use List::Util qw< min max >;
sub by_number {
if ($a < $b){ -1 } elsif ($a > $b) { 1 } else { 0 }
}
#
my $number_rx = qr{
# leading sign, positive or negative
(?: [+-] ? )
# mantissa
(?= [0123456789.] )
(?:
# "N" or "N." or "N.N"
(?:
(?: [0123456789] + )
(?:
(?: [.] )
(?: [0123456789] * )
) ?
|
# ".N", no leading digits
(?:
(?: [.] )
(?: [0123456789] + )
)
)
)
# abscissa
(?:
(?: [Ee] )
(?:
(?: [+-] ? )
(?: [0123456789] + )
)
|
)
}x;
my $n = 0;
my $sum = 0;
my @values = ();
my %seen = ();
while (<>) {
while (/($number_rx)/g) {
$n++;
my $num = 0 + $1; # 0+ is so numbers in alternate form count as same
$sum += $num;
push @values, $num;
$seen{$num}++;
}
}
die "no values" if $n == 0;
my $mean = $sum / $n;
my $sqsum = 0;
for (@values) {
$sqsum += ( $_ ** 2 );
}
$sqsum /= $n;
$sqsum -= ( $mean ** 2 );
my $stdev = sqrt($sqsum);
my $max_seen_count = max values %seen;
my @modes = grep { $seen{$_} == $max_seen_count } keys %seen;
my $mode = @modes == 1
? $modes[0]
: "(" . join(", ", @modes) . ")";
$mode .= ' @ ' . $max_seen_count;
my $median;
my $mid = int @values/2;
my @sorted_values = sort by_number @values;
if (@values % 2) {
$median = $sorted_values[ $mid ];
} else {
$median = ($sorted_values[$mid-1] + $sorted_values[$mid])/2;
}
my $min = min @values;
my $max = max @values;
printf "n is %d, min is %g, max is %g\n", $n, $min, $max;
printf "mode is %s, median is %g, mean is %g, stdev is %g\n",
$mode, $median, $mean, $stdev;