Question

我正在尝试创建一个脚本，用于计算多行的平均值。

这个数字取决于我拥有的样本数量，这些样本会有所不同。

这些文件的示例如下：

24  1  2.505
24  2  0.728
24  3  0.681
48  1  2.856
48  2  2.839
48  3  2.942
96  1  13.040
96  2  12.922
96  3  13.130
192 1  50.629
192 2  51.506
192 3  51.016

平均值在第3列计算和

第二列表示样本数，在这种特殊情况下为3。

因此，我应该在这里获得 4个值。

每3行一个平均值。

我尝试过类似的事情：

count=3;
total=0; 

for i in $( awk '{ print $3; }' ${file} )
do 
    for j in 1 2 3
    do
    total=$(echo $total+$i | bc )
    done
    echo "scale=2; $total / $count" | bc
done

但它没有给我正确答案，相反我认为它计算每组三行的平均值。

平均值在第3列计算和

第二列表示样本数，在这种特殊情况下为3。

因此，我应该在这里获得 4个值。

每3行一个平均值。

我尝试过类似的事情：

count=3;
total=0; 

for i in $( awk '{ print $3; }' ${file} )
do 
    for j in 1 2 3
    do
    total=$(echo $total+$i | bc )
    done
    echo "scale=2; $total / $count" | bc
done

但它没有给我正确答案，相反我认为它计算每组三行的平均值。

预期输出

24  1.3046      
48  2.879       
96  13.0306     
192 51.0503

Answer 1

您可以使用以下awk脚本：

awk '{t[$2]+=$3;n[$2]++}END{for(i in t){print i,t[i]/n[i]}}' file

输出：

1 17.2575
2 16.9988
3 16.9423

这可以更好地解释为带有注释的多行脚本：

# On every line of input
{
    # sum up the value of the 3rd column in an array t
    # which is is indexed by the 2nd column
    t[$2]+=$3
    # Increment the number of lines having the same value of
    # the 2nd column
    n[$2]++
}
# At the end of input
END {
    # Iterate through the array t
    for(i in t){
        # Print the number of samples along with the average
        print i,t[i]/n[i]
    }
}

Answer 2

显然我带来了问题的第三种观点。在awk：

$ awk 'NR>1 && $1!=p{print p, s/c; c=s=0} {s+=$3;c++;p=$1} END {print p, s/c}' file
24 1.30467
48 2.879
96 13.0307
192 51.0503

计算多列的平均值

2 个答案: