我有一个包含很少序列的fasta文件,我想执行窗口大小为5的滑动窗口,并在扫描序列时提取序列。
例如(test1.fasta):
人类1
ATCGCGTC
&human; human2
ATTTTCGCGA
预期输出(test1_out.txt):
> human1
ATCGC
&gt ; human1
TCGCG
> human1
CGCGT
> human1
GCGTC
> human2
ATTTT
> human2
TTTTC
> human2
TTTCG
> human2
TTCGC
> human2
TCGCG
&GT; human2 <无线电通信/> CGCGA
我的以下代码只能提取前五个碱基对。如何在窗口大小为5的情况下,将每个步长为1的窗口移动窗口以提取5 bp?
from Bio import SeqIO
with open("test1_out.txt","w") as f:
for seq_record in SeqIO.parse("test1.fasta", "fasta"):
f.write(str(seq_record.id) + "\n")
f.write(str(seq_record.seq[:5]) + "\n") #first 5 base positions
上面的代码我从stackoverflow *
中的其他示例中得到了它答案 0 :(得分:2)
所以我猜“seq_record.seq”就像人类“ATCGCGTC”中的整个DNA序列一样。你可以这样写:
from Bio import SeqIO
with open("test1_out.txt","w") as f:
for seq_record in SeqIO.parse("test1.fasta", "fasta"):
for i in range(len(seq_record.seq) - 4) :
f.write(str(seq_record.id) + "\n")
f.write(str(seq_record.seq[i:i+5]) + "\n") #first 5 base positions