用awk计算平均值和std在列上

时间:2018-12-14 08:54:59

标签: awk mean standard-deviation

我有这个文件:

Took:  15.473214149475098  seconds
Took:  12.94953465461731  seconds
Took:  2.235722780227661  seconds
Took:  40.53083419799805  seconds
Took:  21.840606212615967  seconds
Took:  35.777870893478394  seconds
Took:  13.153780221939087  seconds
Took:  2.966165781021118  seconds
Took:  35.54965615272522  seconds

我想直接在终端中计算时间的平均值和标准差。 awk可以提供帮助吗?我不是很熟悉。我尝试仅通过这种方式拆分文件以获取具有数值的列: cat <filename> | awk -F "Took:" {print$2},但它只是返回了文件的全部内容。

5 个答案:

答案 0 :(得分:3)

请尝试按照以下方法获取第二列的平均值。

awk '{sum+=$2;if($2){count++}} END{print sum/count}'  Input_file

编辑:

awk '{if($2!=""){count++;sum+=$2};y+=$2^2} END{sq=sqrt(y/NR-(sum/NR)^2);sq=sq?sq:0;print "Mean = "sum/count ORS "S.D = ",sq}'  Input_file

答案 1 :(得分:3)

关于标准偏差的Wikipedia page有一个有趣的部分"Rapid calculation methods"Welford's algorithm特别令人感兴趣,它简单且在数值上稳定:

A_0, Q_0 = 0, 0
for k in (1, ...):
    j = k-1
    A_k = A_j + (X_k-A_j)/k
    Q_k = Q_j + (X_k-A_j)*(X_k-A_k)

其中,A_k等于移动平均值,Q_k与人群方差σ²通过关系Q_k = σ²*k相关。

在这种理论背景下,我们可以写作

$ awk 'BEGIN{a=0;q=0}{x=$2;b=a+(x-a)/NR;q+=(x-a)*(x-b);a=b}END{print a,sqrt(q/NR)}' file

答案 2 :(得分:2)

另一种快捷方式,

$ awk '{s+=$2; ss+=$2^2} END{print m=s/NR, sqrt(ss/NR-m^2)}' file

20.053 13.4924

答案 3 :(得分:1)

$ cat tst.awk
{ numbers[NR] = $2; sum += $2 }
END {
    mean = sum / length(numbers)
    # calculate std deviation
    for (i in numbers) {
        dif = numbers[i] - mean
        std += dif ^ 2
    }
    std = sqrt(std / length(numbers))

    print "Mean: " mean
    print "Standart Deviation: " std
}
$
$ awk -f tst.awk file
Mean: 20.053
Standart Deviation: 13.4924

答案 4 :(得分:1)

使用Perl单线版

> cat dada.txt 
Took:  15.473214149475098  seconds
Took:  12.94953465461731  seconds
Took:  2.235722780227661  seconds
Took:  40.53083419799805  seconds
Took:  21.840606212615967  seconds
Took:  35.777870893478394  seconds
Took:  13.153780221939087  seconds
Took:  2.966165781021118  seconds
Took:  35.54965615272522  seconds
> perl -lane '$s+=$F[1];push(@a,$F[1]); END { $m=$s/@a; $sd+=($_-$m)**2 for(@a);$sd=sqrt($sd/@a); print "Mean:$m\nStandard Deviation:$sd"} ' dada.txt
Mean:20.0530427826775
Standard Deviation:13.4923983082523
>