我有这个文件:
Took: 15.473214149475098 seconds
Took: 12.94953465461731 seconds
Took: 2.235722780227661 seconds
Took: 40.53083419799805 seconds
Took: 21.840606212615967 seconds
Took: 35.777870893478394 seconds
Took: 13.153780221939087 seconds
Took: 2.966165781021118 seconds
Took: 35.54965615272522 seconds
我想直接在终端中计算时间的平均值和标准差。 awk
可以提供帮助吗?我不是很熟悉。我尝试仅通过这种方式拆分文件以获取具有数值的列:
cat <filename> | awk -F "Took:" {print$2}
,但它只是返回了文件的全部内容。
答案 0 :(得分:3)
请尝试按照以下方法获取第二列的平均值。
awk '{sum+=$2;if($2){count++}} END{print sum/count}' Input_file
编辑:
awk '{if($2!=""){count++;sum+=$2};y+=$2^2} END{sq=sqrt(y/NR-(sum/NR)^2);sq=sq?sq:0;print "Mean = "sum/count ORS "S.D = ",sq}' Input_file
答案 1 :(得分:3)
关于标准偏差的Wikipedia page有一个有趣的部分"Rapid calculation methods"。 Welford's algorithm特别令人感兴趣,它简单且在数值上稳定:
A_0, Q_0 = 0, 0 for k in (1, ...): j = k-1 A_k = A_j + (X_k-A_j)/k Q_k = Q_j + (X_k-A_j)*(X_k-A_k)
其中,A_k
等于移动平均值,Q_k
与人群方差σ²通过关系Q_k = σ²*k
相关。
在这种理论背景下,我们可以写作
$ awk 'BEGIN{a=0;q=0}{x=$2;b=a+(x-a)/NR;q+=(x-a)*(x-b);a=b}END{print a,sqrt(q/NR)}' file
答案 2 :(得分:2)
另一种快捷方式,
$ awk '{s+=$2; ss+=$2^2} END{print m=s/NR, sqrt(ss/NR-m^2)}' file
20.053 13.4924
答案 3 :(得分:1)
$ cat tst.awk
{ numbers[NR] = $2; sum += $2 }
END {
mean = sum / length(numbers)
# calculate std deviation
for (i in numbers) {
dif = numbers[i] - mean
std += dif ^ 2
}
std = sqrt(std / length(numbers))
print "Mean: " mean
print "Standart Deviation: " std
}
$
$ awk -f tst.awk file
Mean: 20.053
Standart Deviation: 13.4924
答案 4 :(得分:1)
使用Perl单线版
> cat dada.txt
Took: 15.473214149475098 seconds
Took: 12.94953465461731 seconds
Took: 2.235722780227661 seconds
Took: 40.53083419799805 seconds
Took: 21.840606212615967 seconds
Took: 35.777870893478394 seconds
Took: 13.153780221939087 seconds
Took: 2.966165781021118 seconds
Took: 35.54965615272522 seconds
> perl -lane '$s+=$F[1];push(@a,$F[1]); END { $m=$s/@a; $sd+=($_-$m)**2 for(@a);$sd=sqrt($sd/@a); print "Mean:$m\nStandard Deviation:$sd"} ' dada.txt
Mean:20.0530427826775
Standard Deviation:13.4923983082523
>