I have a file with some rows with the same name (col 1, eg gph ans stp) but in the col 2 maximum value and exclude the others rows in linux (shell, eg awk grep perl)
For example:
col1 | col2 | col3 | col4
------+------+--------+-----------
ghp | 25 | data1 | otherdata1
ghp | 23 | data2 | otherdata2
ghp | 23 | data3 | otherdata3
jkl | 58 | data4 | otherdata4
stp | 42 | data5 | otherdata5
stp | 41 | data6 | otherdata6
stp | 41 | data7 | otherdata7
result
col1 | col2 | col3 | col4
------+------+--------+-----------
ghp | 25 | data1 | otherdata2
jkl | 58 | data4 | otherdata4
stp | 42 | data5 | otherdata7
In the same file I have some rows with the same name (col 1, eg ghi ans stu) and in the col 2 the same values, I want to exclude these rows in linux (shell, eg awk grep perl)
For example
col1 | col2 | col3 | col4
------+------+--------+-----------
ghi | 23 | data1 | otherdata1
ghi | 23 | data2 | otherdata2
ghi | 23 | data3 | otherdata3
jkl | 58 | data4 | otherdata4
stu | 41 | data5 | otherdata5
stu | 41 | data6 | otherdata6
stu | 41 | data7 | otherdata7
Result:
col1 | col2 | col3 | col4
------+------+--------+-----------
jkl | 58 | data4 | otherdata4
Thanks
答案 0 :(得分:1)
You can use this awk command:
awk 'NR<=2{print; next} $2>max[$1]{max[$1]=$2; a[$1]=$0} END{for (i in a) print a[i]}' file
col1 | col2 | col3 | col4
------+------+--------+-----------
ghp | 25 | data1 | otherdata1
jkl | 58 | data4 | otherdata4
stp | 42 | data5 | otherdata5