根据其他列中的值,保持所有与最高值匹配的行-Bash

时间:2018-07-13 16:11:41

标签: bash sorting awk

道歉;应该有一种简单的方法可以结合使用sort / unique / awk来完成我想做的事情,但我找不到它。

这是我已经获得的“干净”数据表的一部分(按列Gene,然后是Length排序)。

Length  Gene                    
3013    ENSDARG00000000018      
3013    ENSDARG00000000018      
2933    ENSDARG00000000018      
2933    ENSDARG00000000018      
2933    ENSDARG00000000018      
2933    ENSDARG00000000018      
2033    ENSDARG00000000068      
2033    ENSDARG00000000068      
901     ENSDARG00000000068      
901     ENSDARG00000000068      

对于每个Length值,我都需要保持所有行Gene列中具有最高值。这是所需的输出:

  Length  Gene                    
  3013    ENSDARG00000000018      
  3013    ENSDARG00000000018      
  2033    ENSDARG00000000068      
  2033    ENSDARG00000000068      

给出的解决方案应该适用于具有ca的表。 30,000个Gene值。非常感谢您的帮助!

2 个答案:

答案 0 :(得分:2)

这个简单的awk应该会在这里为您提供帮助。

awk 'FNR==NR{a[$2]=(a[$2]>$1?a[$2]:$1);next} a[$2]==$1'  Input_file  Input_file

说明:

awk '
FNR==NR{                              ##Checking condition here FNR==NR which will be TRUE when first time Input_file is being read.
  a[$2]=(a[$2]>$1?a[$2]:$1)           ##Creating an array named a whose index is $2 and value is depending upon condition if its value is greater than $1 then leave it as it is else replace its value with current $1 value.
  next                                ##next is awk out of box keyword which will skip all further statements.
}
a[$2]==$1                             ##This statement will be executed when 2nd time Input_file is being read and checking condition if value of a[$2] is equal to first field of current line, if yes then print that line.
'  Input_file Input_file              ##Mentioning Input_file name 2 times here.

答案 1 :(得分:0)

这应该做到:


list_fields = [field.verbose_name for field in Cashflows._meta.get_fields() if not field.is_relation or field.one_to_one or (field.many_to_one and field.related_model)]