我有对齐文本文件。如果仅在序列下方是特殊字符“#”,则需要提取核苷酸的位置。字符的位置由给定的基因组行号(基因组19)定义。对于第一个核苷酸(带有#的核苷酸)为20,第二个核苷酸为:24,对于所有其他核苷酸,依此类推。
<----------------------------------INPUT----------------------->
===========================
Alignment results No. 7
===========================
Genome 19 CGAGCCTCCCGCACCGCCCCCCTCCAGG
|#|:|#||||||||||||||||||||!:
Bisulfite 53 CAAACCTCCCACACCACCCCCCTCCAGA
Genome 1184 GCCTGGCCG
#
Bisulfite 190 -T-------
===========================
Alignment results No. 94
===========================
>genome sequence
ACAAGTGTCACGTCTGCATGTTGGCACA
--------------------------
Alignment
--------------------------
'*' : Methylated CpG
'#' : Unmethylated CpG
--------------------------
Genome 1009 GAGCCTCCCGCACCGCCCCCCTCCAGG
#|#||||||#||||#||||||||||!#
Bisulfite 53 AAACCTCCCACACCACCCCCCTCCAGA
<------------------Desired Output--------------------->
PRIMER '#' Position
VII122_10_PRIMER-1 20 24 1185
VII123_10_PRIMER-3 1009 1011
<--------------------------MY CODE-------------------------->
```
with open("201907031925_all_alignment_data.txt") as fo:
for x in fo.read().split('===========================\n A'):
print(x)
fo.close()
```