我的问题与之前提出的问题非常相似:
Output whole line once for each unique value of a column (Bash)
但有一个主要区别。在他的例子中:
pep> AEYTCVAETK 2 genes ADUm.1024,ADUm.5198,ADUm.750
pep> AIQLTGK 1 genes ADUm.1999,ADUm.3560
pep> AIQLTGK 8 genes ADUm.1999,ADUm.3560
pep> KHEPPTEVDIEGR 5 genes ADUm.367
pep> VSSILEDKTT 9 genes ADUm.1192,ADUm.2731
pep> AIQLTGK 10 genes ADUm.1999,ADUm.3560
pep> VSSILEDKILSR 3 genes ADUm.2146,ADUm.5750
pep> VSSILEDKILSR 2 genes ADUm.2146,ADUm.5750
目标是在第2列中为肽的每个不同值打印一行,这意味着上述输入将成为:"
pep> AEYTCVAETK 2 genes ADUm.1024,ADUm.5198,ADUm.750
pep> AIQLTGK 1 genes ADUm.1999,ADUm.3560
pep> KHEPPTEVDIEGR 5 genes ADUm.367
pep> VSSILEDKTT 9 genes ADUm.1192,ADUm.2731
pep> VSSILEDKILSR 3 genes ADUm.2146,ADUm.5750
但我想要做的是为第2列中的每个唯一条目打印一行,但是我想打印第3列中具有最高值的行,因此输出将如下所示:
pep> AEYTCVAETK 2 genes ADUm.1024,ADUm.5198,ADUm.750
pep> AIQLTGK 10 genes ADUm.1999,ADUm.3560
pep> KHEPPTEVDIEGR 5 genes ADUm.367
pep> VSSILEDKTT 9 genes ADUm.1192,ADUm.2731
pep> VSSILEDKILSR 3 genes ADUm.2146,ADUm.5750
提前致谢。
答案 0 :(得分:1)
这是一种方法:
awk '
($2 in seen) {
line[$2] = ($3 > seen[$2]) ? $0 : line[$2];
next
}
{
seen[$2] = $3;
line[$2] = $0
}
END {
for(x in line) print line[x]
}' file
<强>输出:强>
pep> AIQLTGK 10 genes ADUm.1999,ADUm.3560
pep> AEYTCVAETK 2 genes ADUm.1024,ADUm.5198,ADUm.750
pep> VSSILEDKILSR 3 genes ADUm.2146,ADUm.5750
pep> VSSILEDKTT 9 genes ADUm.1192,ADUm.2731
pep> KHEPPTEVDIEGR 5 genes ADUm.367