我有一个格式与此类似的文件:
10:26:50 AEYTCVAETK 2 genes ADUm.1024,ADUm.5198,ADUm.750
10:27:51 AIQLTGK 1 genes ADUm.1999,ADUm.3560
10:35:12 AIQLTGK 8 genes ADUm.1999,ADUm.3560
10:42:26 AEYTCVAETK 2 genes ADUm.1024,ADUm.5198,ADUm.750
10:50:43 KHEPPTEVDIEGR 5 genes ADUm.367
10:52:23 VSSILEDKTT 9 genes ADUm.1192,ADUm.2731
10:52:26 AIQLTGK 10 genes ADUm.1999,ADUm.3560
10:55:16 VSSILEDKILSR 3 genes ADUm.2146,ADUm.5750
10:55:58 VSSILEDKILSR 2 genes ADUm.2146,ADUm.5750
我想在第2列中为名称的每个不同值打印最新行,这意味着上面的输入将成为:
10:42:26 AEYTCVAETK 2 genes ADUm.1024,ADUm.5198,ADUm.750
10:52:26 AIQLTGK 10 genes ADUm.1999,ADUm.3560
10:50:43 KHEPPTEVDIEGR 5 genes ADUm.367
10:52:23 VSSILEDKTT 9 genes ADUm.1192,ADUm.2731
10:55:58 VSSILEDKILSR 2 genes ADUm.2146,ADUm.5750
我该怎么做?
谢谢和问候
答案 0 :(得分:3)
如果文件已按时间排序,则使用awk。
awk '{a[$2]=$0}END{for (i in a) print a[i]}' file|sort -n
10:42:26 AEYTCVAETK 2 genes ADUm.1024,ADUm.5198,ADUm.750
10:50:43 KHEPPTEVDIEGR 5 genes ADUm.367
10:52:23 VSSILEDKTT 9 genes ADUm.1192,ADUm.2731
10:52:26 AIQLTGK 10 genes ADUm.1999,ADUm.3560
10:55:58 VSSILEDKILSR 2 genes ADUm.2146,ADUm.5750
如果原始文件未排序,请运行:
awk '{s=$1;gsub(/:/,"",s);if (s>max[$2]){max[$2]=s;l[$2]=$0}}END{for (i in max) print l[i]}' file|sort -n