如何使用awk获得B列所有A列对的最高和平均分数?

时间:2014-08-30 15:43:11

标签: linux awk

说我有这个file1.csv由A列和A_score列组成如下:

fefa68e,312
wernjnn,432
ew443fs,300

和file2.csv由一对列B和A列组成,如下所示:

dfaefew,fefa68e
dfaefew,wernjnn
vzcxvvz,ew443fs
ewrwefd,wernjnn
ewrwefd,ew443fs

如何获取file3.csv以获得所有列的最大分数A对B列如下所示:

dfaefew,432
vzcxvvz,300
ewrwefd,432

和file4.csv获取列B的所有列A对的平均分数如下所示:

dfaefew,372
vzcxvvz,300
ewrwefd,366

awk或其他任何可以做的工作吗?我正在使用ubuntu。

提前!

更新

如果file2.csv如下所示:

dfaefew,fefa68e,1
dfaefew,wernjnn,1
vzcxvvz,ew443fs,1
ewrwefd,wernjnn,0
ewrwefd,ew443fs,0

第三列可以是1或0和1,对于相同的第1列值(dfaefew,vzcxvvz等)它是相同的,我想保留第三列并获得如下输出:

dfaefew,432,1
vzcxvvz,300,1
ewrwefd,432,0

dfaefew,372,1
vzcxvvz,300,1
ewrwefd,366,0

1 个答案:

答案 0 :(得分:3)

以下是awk中的一种方法:

script.awk的内容:

# Set the input and output field separators to ","
BEGIN { FS = OFS = "," }   

# Processing first file
# Load the first file in hash keyed at column1 having value of column2
NR==FNR { value[$1] = $2; next }   

# Processing second file
# Keep a counter of column1 and add values for column2
{ count[$1]++; values[$1]+=value[$2] }   

# Find the max for each entry of column1
{ entry[$1] = (($1 in entry) && entry[$1]>value[$2] ? entry[$1] : value[$2]) } 

# In the END block traverse through array and print desired output.
END {
     for (max in entry) print (max, entry[max]) > "file3.csv";
     for (key in entry) print (key, values[key]/count[key]) > "file4.csv";
}

像以下一样运行:

awk -f script.awk file1.csv file2.csv

输出:

$ cat file3.csv
vzcxvvz,300
ewrwefd,432
dfaefew,432

$ cat file4.csv
vzcxvvz,300
ewrwefd,366
dfaefew,372