如下所示,当基因位于负链(" - "第6列)时,外显子数字(第5列)应该相反。例如,对于NT5C3B基因,编号为1,2 .... 9的外显子应该反转为9,8,... 1,我不知道如何用awk或Python以编程方式执行此操作。非常感谢您的帮助。
chr17 39968961 39969531 FKBP10 1 +
chr17 39973309 39973455 FKBP10 2 +
chr17 39974340 39974530 FKBP10 3 +
chr17 39974633 39974779 FKBP10 4 +
chr17 39975461 39975651 FKBP10 5 +
chr17 39975781 39975927 FKBP10 6 +
chr17 39976520 39976713 FKBP10 7 +
chr17 39977198 39977341 FKBP10 8 +
chr17 39977905 39978069 FKBP10 9 +
chr17 39978474 39979469 FKBP10 10 +
chr17 39981333 39981909 NT5C3B 1 -
chr17 39983677 39983878 NT5C3B 2 -
chr17 39985041 39985204 NT5C3B 3 -
chr17 39987052 39987142 NT5C3B 4 -
chr17 39988643 39988729 NT5C3B 5 -
chr17 39991321 39991368 NT5C3B 6 -
chr17 39991454 39991524 NT5C3B 7 -
chr17 39992110 39992209 NT5C3B 8 -
chr17 39992433 39992523 NT5C3B 9 -
chr17 39994042 39994378 KLHL10 1 +
chr17 39998074 39998564 KLHL10 2 +
chr17 40001377 40001995 KLHL10 3 +
chr17 40003512 40003662 KLHL10 4 +
chr17 40004184 40004599 KLHL10 5 +
chr17 40009798 40011573 KLHL11 1 -
chr17 40021078 40021629 KLHL11 2 -
chr17 40023178 40024157 ACLY 1 -
chr17 40024961 40025038 ACLY 2 -
chr17 40025295 40025378 ACLY 3 -
chr17 40025726 40025840 ACLY 4 -
chr17 40027941 40028085 ACLY 5 -
chr17 40028284 40028435 ACLY 6 -
chr17 40030063 40030218 ACLY 7 -
chr17 40034355 40034449 ACLY 8 -
chr17 40035049 40035177 ACLY 9 -
chr17 40039374 40039485 ACLY 10 -
chr17 40040445 40040527 ACLY 11 -
chr17 40042364 40042561 ACLY 12 -
chr17 40043851 40043956 ACLY 13 -
chr17 40048531 40048700 ACLY 14 -
chr17 40049285 40049427 ACLY 15 -
chr17 40052872 40052902 ACLY 16 -
chr17 40054001 40054092 ACLY 17 -
chr17 40054883 40055038 ACLY 18 -
chr17 40057948 40058066 ACLY 19 -
chr17 40060981 40061043 ACLY 20 -
chr17 40061774 40061911 ACLY 21 -
chr17 40062780 40062899 ACLY 22 -
chr17 40063694 40063825 ACLY 23 -
chr17 40065241 40065321 ACLY 24 -
chr17 40065762 40065953 ACLY 25 -
chr17 40066474 40066537 ACLY 26 -
chr17 40068672 40068795 ACLY 27 -
chr17 40069967 40070149 ACLY 28 -
chr17 40075132 40075272 ACLY 29 -
杰夫
答案 0 :(得分:1)
awk
救援!
假设已排序的输入为提供的文件
awk '$4!=p{for(;i>0;i--) print a[i]; i=0; p=$4}
$6=="-"{a[++i]=$0; next} 1;
END{for(;i>0;i--) print a[i]}' file
仅更改订单号,更容易编写两遍算法,例如
$ awk -v OFS='\t' 'NR==FNR{a[$4]=$5; next}
$6=="-"{$5=a[$4]-$5+1}1' file{,} |
column -t
输入格式丢失,这就是为什么需要设置OFS和column -t
进行漂亮打印的原因。
答案 1 :(得分:0)
试试这个,它应该做你想要的任何我想的。但格式化不会完全相同。
from itertools import groupby
res = []
with open('filenamegoeshere.whatever') as f:
for line in f:
res.append(line.strip().split())
res2 = [list(g) for k, g in groupby(res, lambda x: (x[3], x[5]))]
res3 = [l if l[0][5] == '+' else list(reversed(l)) for l in res2]
res4 = [item for sublist in res3 for item in sublist]
for row in res4:
print(" ".join(row))