基于唯一列awk汇总行

时间:2014-02-14 17:02:42

标签: awk count unique

我正在寻找一种更优雅的方式(超过> 100列):

awk '{a[$1]+=$4}{b[$1]+=$5}{c[$1]+=$6}{d[$1]+=$7}{e[$1]+=$8}{f[$1]+=$9}{g[$1]+=$10}END{for(i in a) print i,a[i],b[i],c[i],d[i],e[i],f[i],g[i]}'

这是输入:

 a1 1   1   2   2
 a2 2   5   3   7
 a2 2   3   3   8
 a3 1   4   6   1
 a3 1   7   9   4
 a3 1   2   4   2

并输出:

 a1 1 1 2 2
 a2 4 8 6 15
 a3 3 13 19 7

谢谢:)

3 个答案:

答案 0 :(得分:5)

我将单行内容分成几行,以便于阅读。

awk '{n[$1];for(i=2;i<=NF;i++)a[$1,i]+=$i}
    END{for(x in n){
        printf "%s ", x
        for(y=2;y<=NF;y++)printf "%s%s", a[x,y],(y==NF?ORS:FS)
        }
    }' file

这个awk命令应该适用于你的100列文件。

使用您的文件进行测试:

kent$  cat f
a1 1   1   2   2
a2 2   5   3   7
a2 2   3   3   8
a3 1   4   6   1
a3 1   7   9   4
a3 1   2   4   2

kent$  awk '{n[$1];for(i=2;i<=NF;i++)a[$1,i]+=$i}END{for(x in n){printf "%s ", x;for(y=2;y<=NF;y++)printf "%s%s", a[x,y],(y==NF?ORS:OFS)}}' f
a1 1 1 2 2
a2 4 8 6 15
a3 3 13 19 7

答案 1 :(得分:1)

如果你关心输出的顺序,试试这个

$ cat file
a1 1   1   2   2
a2 2   5   3   7
a2 2   3   3   8
a3 1   4   6   1
a3 1   7   9   4
a3 1   2   4   2

Awk代码:

$ cat tester
awk 'FNR==NR{
              U[$1]                             # Array U with index being field1
              for(i=2;i<=NF;i++)                # loop through columns thats is column2 to NF
              A[$1,i]+=$i                       # Array A holds sum of columns
              next                              # stop processing the current record and go on to the next record
            }
   ($1 in U){                                   # Here we read same file once again,if field1 is found in array U, then following statements
              for(i=1;i<=NF;i++)
              s = s ? s OFS A[$1,i] : A[$1,i]   # I am writing sum to variable s since I want to use only one print statement, here you can use printf also
              print $1,s                        # print column1 and variable s
              delete U[$1]                      # We have done, so delete array element
              s = ""                            # reset variable s
            }' OFS='\t' file{,}                 # output field separator is tab you can set comma also

<强>所得

$ bash tester
a1  1   1   2   2
a2  4   8   6   15
a3  3   13  19  7

如果您想在Solaris/SunOS system上尝试此操作,请将awk更改为/usr/xpg4/bin/awk/usr/xpg6/bin/awknawk

<强> - 编辑 -

根据评论中的要求,这里有一个班轮,在上面的帖子中,为了更好的阅读目的,我评论过,它变成了几行。

$ awk 'FNR==NR{U[$1];for(i=2;i<=NF;i++)A[$1,i]+=$i;next}($1 in U){for(i=1;i<=NF;i++)s = s ? s OFS A[$1,i] : A[$1,i];print $1,s;delete U[$1];s = ""}' OFS='\t' file{,}
a1  1   1   2   2
a2  4   8   6   15
a3  3   13  19  7

答案 2 :(得分:1)

在gnu awk版本4中使用数组数组

awk '{for (i=2;i<=NF;i++) a[$1][i]+=$i}
END{for (i in a) 
      { printf i FS;
        for (j in a[i]) printf a[i][j] FS 
        printf RS}
    }' file     

a1 1 1 2 2
a2 4 8 6 15
a3 3 13 19 7