道歉;应该有一种简单的方法可以结合使用sort
/ unique
/ awk
来完成我想做的事情,但我找不到它。
这是我已经获得的“干净”数据表的一部分(按列Gene
,然后是Length
排序)。
Length Gene
3013 ENSDARG00000000018
3013 ENSDARG00000000018
2933 ENSDARG00000000018
2933 ENSDARG00000000018
2933 ENSDARG00000000018
2933 ENSDARG00000000018
2033 ENSDARG00000000068
2033 ENSDARG00000000068
901 ENSDARG00000000068
901 ENSDARG00000000068
对于每个Length
值,我都需要保持所有行在Gene
列中具有最高值。这是所需的输出:
Length Gene
3013 ENSDARG00000000018
3013 ENSDARG00000000018
2033 ENSDARG00000000068
2033 ENSDARG00000000068
给出的解决方案应该适用于具有ca的表。 30,000个Gene
值。非常感谢您的帮助!
答案 0 :(得分:2)
这个简单的awk
应该会在这里为您提供帮助。
awk 'FNR==NR{a[$2]=(a[$2]>$1?a[$2]:$1);next} a[$2]==$1' Input_file Input_file
说明:
awk '
FNR==NR{ ##Checking condition here FNR==NR which will be TRUE when first time Input_file is being read.
a[$2]=(a[$2]>$1?a[$2]:$1) ##Creating an array named a whose index is $2 and value is depending upon condition if its value is greater than $1 then leave it as it is else replace its value with current $1 value.
next ##next is awk out of box keyword which will skip all further statements.
}
a[$2]==$1 ##This statement will be executed when 2nd time Input_file is being read and checking condition if value of a[$2] is equal to first field of current line, if yes then print that line.
' Input_file Input_file ##Mentioning Input_file name 2 times here.
答案 1 :(得分:0)
这应该做到:
list_fields = [field.verbose_name for field in Cashflows._meta.get_fields() if not field.is_relation or field.one_to_one or (field.many_to_one and field.related_model)]