Question

道歉；应该有一种简单的方法可以结合使用sort / unique / awk来完成我想做的事情，但我找不到它。

这是我已经获得的“干净”数据表的一部分（按列Gene，然后是Length排序）。

Length  Gene                    
3013    ENSDARG00000000018      
3013    ENSDARG00000000018      
2933    ENSDARG00000000018      
2933    ENSDARG00000000018      
2933    ENSDARG00000000018      
2933    ENSDARG00000000018      
2033    ENSDARG00000000068      
2033    ENSDARG00000000068      
901     ENSDARG00000000068      
901     ENSDARG00000000068

对于每个Length值，我都需要保持所有行在Gene列中具有最高值。这是所需的输出：

  Length  Gene                    
  3013    ENSDARG00000000018      
  3013    ENSDARG00000000018      
  2033    ENSDARG00000000068      
  2033    ENSDARG00000000068

给出的解决方案应该适用于具有ca的表。 30,000个Gene值。非常感谢您的帮助！

Answer 1

这个简单的awk应该会在这里为您提供帮助。

awk 'FNR==NR{a[$2]=(a[$2]>$1?a[$2]:$1);next} a[$2]==$1'  Input_file  Input_file

说明：

awk '
FNR==NR{                              ##Checking condition here FNR==NR which will be TRUE when first time Input_file is being read.
  a[$2]=(a[$2]>$1?a[$2]:$1)           ##Creating an array named a whose index is $2 and value is depending upon condition if its value is greater than $1 then leave it as it is else replace its value with current $1 value.
  next                                ##next is awk out of box keyword which will skip all further statements.
}
a[$2]==$1                             ##This statement will be executed when 2nd time Input_file is being read and checking condition if value of a[$2] is equal to first field of current line, if yes then print that line.
'  Input_file Input_file              ##Mentioning Input_file name 2 times here.

Answer 2

这应该做到：


list_fields = [field.verbose_name for field in Cashflows._meta.get_fields() if not field.is_relation or field.one_to_one or (field.many_to_one and field.related_model)]

根据其他列中的值，保持所有与最高值匹配的行-Bash

2 个答案: