
时间:2018-07-13 16:11:41

标签: bash sorting awk

道歉;应该有一种简单的方法可以结合使用sort / unique / awk来完成我想做的事情,但我找不到它。


Length  Gene                    
3013    ENSDARG00000000018      
3013    ENSDARG00000000018      
2933    ENSDARG00000000018      
2933    ENSDARG00000000018      
2933    ENSDARG00000000018      
2933    ENSDARG00000000018      
2033    ENSDARG00000000068      
2033    ENSDARG00000000068      
901     ENSDARG00000000068      
901     ENSDARG00000000068      


  Length  Gene                    
  3013    ENSDARG00000000018      
  3013    ENSDARG00000000018      
  2033    ENSDARG00000000068      
  2033    ENSDARG00000000068      

给出的解决方案应该适用于具有ca的表。 30,000个Gene值。非常感谢您的帮助!

2 个答案:

答案 0 :(得分:2)


awk 'FNR==NR{a[$2]=(a[$2]>$1?a[$2]:$1);next} a[$2]==$1'  Input_file  Input_file


awk '
FNR==NR{                              ##Checking condition here FNR==NR which will be TRUE when first time Input_file is being read.
  a[$2]=(a[$2]>$1?a[$2]:$1)           ##Creating an array named a whose index is $2 and value is depending upon condition if its value is greater than $1 then leave it as it is else replace its value with current $1 value.
  next                                ##next is awk out of box keyword which will skip all further statements.
a[$2]==$1                             ##This statement will be executed when 2nd time Input_file is being read and checking condition if value of a[$2] is equal to first field of current line, if yes then print that line.
'  Input_file Input_file              ##Mentioning Input_file name 2 times here.

答案 1 :(得分:0)


list_fields = [field.verbose_name for field in Cashflows._meta.get_fields() if not field.is_relation or field.one_to_one or (field.many_to_one and field.related_model)]