How to exclude rows in which column one and two have the same value?

时间:2016-07-11 20:55:31

标签: linux shell select awk duplicates

I have a file with some rows with the same name (col 1, eg gph ans stp) but in the col 2 maximum value and exclude the others rows in linux (shell, eg awk grep perl)

For example:

col1 | col2 | col3   | col4
------+------+--------+-----------
ghp  | 25   | data1  | otherdata1
ghp  | 23   | data2  | otherdata2
ghp  | 23   | data3  | otherdata3
jkl  | 58   | data4  | otherdata4
stp  | 42   | data5  | otherdata5
stp  | 41   | data6  | otherdata6
stp  | 41   | data7  | otherdata7

result

 col1 | col2 | col3   | col4
 ------+------+--------+-----------
 ghp  | 25   | data1  | otherdata2
 jkl  | 58   | data4  | otherdata4
 stp  | 42   | data5  | otherdata7

In the same file I have some rows with the same name (col 1, eg ghi ans stu) and in the col 2 the same values, I want to exclude these rows in linux (shell, eg awk grep perl)

For example

col1 | col2 | col3   | col4
------+------+--------+-----------
ghi  | 23   | data1  | otherdata1
ghi  | 23   | data2  | otherdata2
ghi  | 23   | data3  | otherdata3
jkl  | 58   | data4  | otherdata4
stu  | 41   | data5  | otherdata5
stu  | 41   | data6  | otherdata6
stu  | 41   | data7  | otherdata7

Result:

 col1 | col2 | col3   | col4
 ------+------+--------+-----------
 jkl  | 58   | data4  | otherdata4

Thanks

1 个答案:

答案 0 :(得分:1)

You can use this awk command:

awk 'NR<=2{print; next} $2>max[$1]{max[$1]=$2; a[$1]=$0} END{for (i in a) print a[i]}' file

col1 | col2 | col3   | col4
------+------+--------+-----------
ghp  | 25   | data1  | otherdata1
jkl  | 58   | data4  | otherdata4
stp  | 42   | data5  | otherdata5