如何逆转负链上的外显子数?

时间:2016-11-22 23:39:28

标签: python awk

如下所示,当基因位于负链(" - "第6列)时,外显子数字(第5列)应该相反。例如,对于NT5C3B基因,编号为1,2 .... 9的外显子应该反转为9,8,... 1,我不知道如何用awk或Python以编程方式执行此操作。非常感谢您的帮助。

chr17   39968961    39969531    FKBP10  1   +
chr17   39973309    39973455    FKBP10  2   +
chr17   39974340    39974530    FKBP10  3   +
chr17   39974633    39974779    FKBP10  4   +
chr17   39975461    39975651    FKBP10  5   +
chr17   39975781    39975927    FKBP10  6   +
chr17   39976520    39976713    FKBP10  7   +
chr17   39977198    39977341    FKBP10  8   +
chr17   39977905    39978069    FKBP10  9   +
chr17   39978474    39979469    FKBP10  10  +
chr17   39981333    39981909    NT5C3B  1   -
chr17   39983677    39983878    NT5C3B  2   -
chr17   39985041    39985204    NT5C3B  3   -
chr17   39987052    39987142    NT5C3B  4   -
chr17   39988643    39988729    NT5C3B  5   -
chr17   39991321    39991368    NT5C3B  6   -
chr17   39991454    39991524    NT5C3B  7   -
chr17   39992110    39992209    NT5C3B  8   -
chr17   39992433    39992523    NT5C3B  9   -
chr17   39994042    39994378    KLHL10  1   +
chr17   39998074    39998564    KLHL10  2   +
chr17   40001377    40001995    KLHL10  3   +
chr17   40003512    40003662    KLHL10  4   +
chr17   40004184    40004599    KLHL10  5   +
chr17   40009798    40011573    KLHL11  1   -
chr17   40021078    40021629    KLHL11  2   -
chr17   40023178    40024157    ACLY    1   -
chr17   40024961    40025038    ACLY    2   -
chr17   40025295    40025378    ACLY    3   -
chr17   40025726    40025840    ACLY    4   -
chr17   40027941    40028085    ACLY    5   -
chr17   40028284    40028435    ACLY    6   -
chr17   40030063    40030218    ACLY    7   -
chr17   40034355    40034449    ACLY    8   -
chr17   40035049    40035177    ACLY    9   -
chr17   40039374    40039485    ACLY    10  -
chr17   40040445    40040527    ACLY    11  -
chr17   40042364    40042561    ACLY    12  -
chr17   40043851    40043956    ACLY    13  -
chr17   40048531    40048700    ACLY    14  -
chr17   40049285    40049427    ACLY    15  -
chr17   40052872    40052902    ACLY    16  -
chr17   40054001    40054092    ACLY    17  -
chr17   40054883    40055038    ACLY    18  -
chr17   40057948    40058066    ACLY    19  -
chr17   40060981    40061043    ACLY    20  -
chr17   40061774    40061911    ACLY    21  -
chr17   40062780    40062899    ACLY    22  -
chr17   40063694    40063825    ACLY    23  -
chr17   40065241    40065321    ACLY    24  -
chr17   40065762    40065953    ACLY    25  -
chr17   40066474    40066537    ACLY    26  -
chr17   40068672    40068795    ACLY    27  -
chr17   40069967    40070149    ACLY    28  -
chr17   40075132    40075272    ACLY    29  -

杰夫

2 个答案:

答案 0 :(得分:1)

awk救援!

假设已排序的输入为提供的文件

awk '$4!=p{for(;i>0;i--) print a[i]; i=0; p=$4} 
   $6=="-"{a[++i]=$0; next} 1; 
       END{for(;i>0;i--) print a[i]}' file

仅更改订单号,更容易编写两遍算法,例如

$ awk -v OFS='\t' 'NR==FNR{a[$4]=$5; next}
                   $6=="-"{$5=a[$4]-$5+1}1' file{,} | 
  column -t 

输入格式丢失,这就是为什么需要设置OFS和​​column -t进行漂亮打印的原因。

答案 1 :(得分:0)

试试这个,它应该做你想要的任何我想的。但格式化不会完全相同。

from itertools import groupby                                                                       
res = []                                                                                            
with open('filenamegoeshere.whatever') as f:                                                                         
    for line in f:                                                                                  
        res.append(line.strip().split())                                                            
res2 = [list(g) for k, g in groupby(res, lambda x: (x[3], x[5]))]                                           
res3 = [l if l[0][5] == '+' else list(reversed(l)) for l in res2]                                   
res4 = [item for sublist in res3 for item in sublist]                                               
for row in res4:                                                                                    
    print(" ".join(row))