在shell中为某些random data计算min / avg / max / std-dev的最佳方法是什么?
如果每行有多个列,并且需要计算每个列的统计信息怎么办?
示例输入(基于hping输出的处理),第3,4和5列是感兴趣的:
0 145.5 146 = 75 + 71
1 142.7 142 = 72 + 70
2 140.7 140 = 70 + 70
3 146.7 146 = 76 + 70
4 148.3 148 = 77 + 71
5 157.5 157 = 87 + 70
6 167.1 167 = 96 + 71
7 166.3 166 = 95 + 71
8 167.7 167 = 97 + 70
9 159.0 159 = 88 + 71
10 156.7 156 = 86 + 70
11 154.9 155 = 84 + 71
12 151.9 152 = 81 + 71
13 157.3 157 = 86 + 71
14 155.0 155 = 84 + 71
15 157.7 158 = 87 + 71
16 156.6 156 = 86 + 70
(请注意,此输入是一个无限的实时流。)
答案 0 :(得分:2)
我建议您使用Perl并保持N,Σx和Σx²的运行总和,以及最小和最大x值。您需要的所有值都可以从这些值中获得。
此示例演示。它会在读取每行输入后转储当前统计信息。
use strict;
use warnings;
my ($n, @sum, @sumsq, @min, @max);
while (<DATA>) {
my @columns = /[0-9.]+/g;
my (@mean, @std_dev);
++$n;
for my $i (0 .. 2) {
my $x = $columns[$i + 2];
my $xsq = $x * $x;
$sum[$i] += $x;
$sumsq[$i] += $xsq;
$mean[$i] = $sum[$i] / $n;
$std_dev[$i] = sqrt($sumsq[$i]/$n - $mean[$i] * $mean[$i]);
$min[$i] = $x unless defined $min[$i] and $min[$i] <= $x;
$max[$i] = $x unless defined $max[$i] and $max[$i] >= $x;
}
print "min = @min\n";
print "max = @max\n";
print "mean = @mean\n";
print "std_dev = @std_dev\n";
print "---\n";
}
__DATA__
0 145.5 146 = 75 + 71
1 142.7 142 = 72 + 70
2 140.7 140 = 70 + 70
3 146.7 146 = 76 + 70
4 148.3 148 = 77 + 71
5 157.5 157 = 87 + 70
6 167.1 167 = 96 + 71
7 166.3 166 = 95 + 71
8 167.7 167 = 97 + 70
9 159.0 159 = 88 + 71
10 156.7 156 = 86 + 70
11 154.9 155 = 84 + 71
12 151.9 152 = 81 + 71
13 157.3 157 = 86 + 71
14 155.0 155 = 84 + 71
15 157.7 158 = 87 + 71
16 156.6 156 = 86 + 70
<强>输出强>
min = 146 75 71
max = 146 75 71
mean = 146 75 71
std_dev = 0 0 0
---
min = 142 72 70
max = 146 75 71
mean = 144 73.5 70.5
std_dev = 2 1.5 0.5
---
min = 140 70 70
max = 146 75 71
mean = 142.666666666667 72.3333333333333 70.3333333333333
std_dev = 2.4944382578501 2.05480466765642 0.47140452079146
---
min = 140 70 70
max = 146 76 71
mean = 143.5 73.25 70.25
std_dev = 2.59807621135332 2.38484800354236 0.433012701892219
---
min = 140 70 70
max = 148 77 71
mean = 144.4 74 70.4
std_dev = 2.93938769133971 2.60768096208109 0.489897948555485
---
min = 140 70 70
max = 157 87 71
mean = 146.5 76.1666666666667 70.3333333333333
std_dev = 5.40832691319598 5.39804491356711 0.47140452079146
---
min = 140 70 70
max = 167 96 71
mean = 149.428571428571 79 70.4285714285714
std_dev = 8.74817765279739 8.55235974119756 0.494871659305337
---
min = 140 70 70
max = 167 96 71
mean = 151.5 81 70.5
std_dev = 9.8488578017961 9.59166304662544 0.5
---
min = 140 70 70
max = 167 97 71
mean = 153.222222222222 82.7777777777778 70.4444444444444
std_dev = 10.4857339888036 10.3470637571759 0.496903995000609
---
min = 140 70 70
max = 167 97 71
mean = 153.8 83.3 70.5
std_dev = 10.0975244490914 9.94032192637645 0.5
---
min = 140 70 70
max = 167 97 71
mean = 154 83.5454545454545 70.4545454545455
std_dev = 9.64836302648838 9.50945592902742 0.497929597732158
---
min = 140 70 70
max = 167 97 71
mean = 154.083333333333 83.5833333333333 70.5
std_dev = 9.24173805202349 9.10547759440561 0.5
---
min = 140 70 70
max = 167 97 71
mean = 153.923076923077 83.3846153846154 70.5384615384615
std_dev = 8.89651218141581 8.77530154238378 0.498518515262866
---
min = 140 70 70
max = 167 97 71
mean = 154.142857142857 83.5714285714286 70.5714285714286
std_dev = 8.60943952761114 8.48287590817347 0.494871659305337
---
min = 140 70 70
max = 167 97 71
mean = 154.2 83.6 70.6
std_dev = 8.32025640630559 8.19593395125498 0.489897948558269
---
min = 140 70 70
max = 167 97 71
mean = 154.4375 83.8125 70.625
std_dev = 8.10839649684202 7.97824189593171 0.484122918275927
---
min = 140 70 70
max = 167 97 71
mean = 154.529411764706 83.9411764705882 70.5882352941177
std_dev = 7.874886718579 7.75712642546343 0.492152956783766
---