使用shell脚本获取最新的一行

时间:2014-04-10 07:03:39

标签: shell unix awk

我有一个格式与此类似的文件:

10:26:50 AEYTCVAETK     2   genes ADUm.1024,ADUm.5198,ADUm.750
10:27:51 AIQLTGK        1   genes ADUm.1999,ADUm.3560
10:35:12 AIQLTGK        8   genes ADUm.1999,ADUm.3560
10:42:26 AEYTCVAETK     2   genes ADUm.1024,ADUm.5198,ADUm.750
10:50:43 KHEPPTEVDIEGR  5   genes ADUm.367
10:52:23 VSSILEDKTT     9   genes ADUm.1192,ADUm.2731
10:52:26 AIQLTGK        10  genes ADUm.1999,ADUm.3560
10:55:16 VSSILEDKILSR   3   genes ADUm.2146,ADUm.5750
10:55:58 VSSILEDKILSR   2   genes ADUm.2146,ADUm.5750

我想在第2列中为名称的每个不同值打印最新行,这意味着上面的输入将成为:

10:42:26 AEYTCVAETK     2   genes ADUm.1024,ADUm.5198,ADUm.750
10:52:26 AIQLTGK        10  genes ADUm.1999,ADUm.3560
10:50:43 KHEPPTEVDIEGR  5   genes ADUm.367
10:52:23 VSSILEDKTT     9   genes ADUm.1192,ADUm.2731
10:55:58 VSSILEDKILSR   2   genes ADUm.2146,ADUm.5750

我该怎么做?

谢谢和问候

1 个答案:

答案 0 :(得分:3)

如果文件已按时间排序,则使用awk。

awk '{a[$2]=$0}END{for (i in a) print a[i]}' file|sort -n

10:42:26 AEYTCVAETK     2   genes ADUm.1024,ADUm.5198,ADUm.750
10:50:43 KHEPPTEVDIEGR  5   genes ADUm.367
10:52:23 VSSILEDKTT     9   genes ADUm.1192,ADUm.2731
10:52:26 AIQLTGK        10  genes ADUm.1999,ADUm.3560
10:55:58 VSSILEDKILSR   2   genes ADUm.2146,ADUm.5750

如果原始文件未排序,请运行:

awk '{s=$1;gsub(/:/,"",s);if (s>max[$2]){max[$2]=s;l[$2]=$0}}END{for (i in max) print l[i]}' file|sort -n