AWK,来自多个文件的不同长度的平均列

时间:2017-11-10 00:39:34

标签: bash awk average

我需要从多个文件的列中计算平均值,但列的行数不同。我想awk是最好的工具,但bash的任何东西都可以。每个文件1列的解决方案是可以的。如果解决方案适用于具有多列的文件,那就更好了。

实施例

file_1:

10
20
30
40
50

file_2:

20
30
40

预期结果:

15
25
35
40
50

2 个答案:

答案 0 :(得分:0)

我为您准备了以下bash脚本, 我希望这可以帮助你。

如果您有任何疑问,请与我们联系。

#!/usr/bin/env bash

#check if the files provided as parameters exist
if [ ! -f $1 ] || [ ! -f $2 ]; then
    echo "ERROR: file> $1 or file> $2 is missing"  
    exit 1;
fi
#save the length of both files in variables
file1_length=$(wc -l $1 | awk '{print $1}')
file2_length=$(wc -l $2 | awk '{print $1}')

#if file 1 is longer than file 2 appends n 0\t to the end of the file
#until both files are the same length
# you can improve the scrips by creating temp files instead of working directly on the input ones
if [ "$file1_length" -gt "$file2_length" ]; then
    n_zero_to_append=$(( file1_length - file2_length ))
    echo "append $n_zero_to_append zeros to file $2"
    #append n zeros to the end of file
    yes 0 | head -n "${n_zero_to_append}" >> $2
    #combine both files and compute the average line by line
    awk 'FNR==NR { a[FNR""] = $0; next } { print (a[FNR""]+$0)/2 }' $1 $2
#if file 2 is longer than file 1 do the inverse operation
# you can improve the scrips by creating temp files instead of working on the input ones
elif [ "$file2_length" -gt "$file1_length" ]; then
    n_zero_to_append=$(( file2_length - file1_length ))
    echo "append $n_zero_to_append zeros to file $1"
    yes 0 | head -n "${n_zero_to_append}" >> $1
    awk 'FNR==NR { a[FNR""] = $0; next } { print (a[FNR""]+$0)/2 }' $1 $2
#if files have the same size we do not need to append anything
#and we can directly compute the average line by line
else 
    echo "the files : $1 and $2 have the same size."
    awk 'FNR==NR { a[FNR""] = $0; next } { print (a[FNR""]+$0)/2 }' $1 $2
fi  

答案 1 :(得分:0)

awk将成为轻松完成任务的工具,

awk '{a[FNR]+=$0;n[FNR]++;next}END{for(i=1;i<=length(a);i++)print a[i]/n[i]}' file1 file2

该方法也适用于多个文件。

简要说明,

  • FNR将是当前输入文件中的输入记录号。
  • 将文件中特定列的总和记录到a[FNR]
  • 将特定列的显示次数记录到n[FNR]
  • 使用for循环中的print a[i]/n[i]打印每列的平均值