BASH:在重复组中找到最大值

时间:2015-03-01 16:29:26

标签: bash sorting csv awk grep

我有以下csv文件file1.csv

sales,artist
10,0131
10,0131
10,10_000 Maniacs
10,1000names
15,E1001 Ways
15,E1001 Ways
10,S101 Strings Orchestra
10,D101 Strings Orchestra
10,x0cc
10,x0cc

我正在编写一个BASH命令来查找每位艺术家的总销售额。输出按总销售额的降序排序。

Expected output.

30,E1001 Ways
20,0131
20,x0cc
10,10_000 Maniacs
10,1000names   
10,S101 Strings Orchestra
10,D101 Strings Orchestra

我已经编写了代码来查找最大值,但它为我提供了所有艺术家的最大销售价值,而不是每个艺术家的总销售额。

 sort -nr file1.csv | awk 'BEGIN { FS="," }{ print $2; }'

有任何帮助可以解决这个问题吗? 感谢。

输出

awk -F, 'NR > 1 { sales[$9] += $3 } END { for(s in sales) print sales[s] FS s }' million_songs_metadata_and_sales.csv | sort -nr -k1 | head -10

903,10000 Maniacs
562,51717
513,12012
506,35007
350,37500 Yens
2788,7000 Dying Rats
2325,2002
2210,1001 Ways
1992,1349
1968,1200 Techniques

1 个答案:

答案 0 :(得分:3)

使用GNU awk:

awk -F, 'NR > 1 { sales[$2] += $1 } END { PROCINFO["sorted_in"] = "@val_num_desc"; for(s in sales) print sales[s] FS s }' file1.csv

那是

NR > 1 {                 # from the second line onwards (to skip the header)
  sales[$2] += $1        # sum up the totals
}
END {                    # and in the end

  # GNU-specific: array traversal in numerically descending order of value
  PROCINFO["sorted_in"] = "@val_num_desc"

  for(s in sales) {      # print the lot.
    print sales[s] FS s
  }
}

用普通的awk:

awk -F, 'NR > 1 { sales[$2] += $1 } END { for(s in sales) print sales[s] FS s }' file1.csv | sort -nr

即删除特定于GNU的PROCINFO位并通过sort -nr管道输出结果。