下面的代码在窗口大小为100的每个序列中提取短序列。窗口将按步长1移动并提取序列。我想提取每个步长50的短序列。任何人都可以帮助我吗?
from Bio import SeqIO
with open("B.fasta","w") as f:
for seq_record in SeqIO.parse("A.fasta", "fasta"):
for i in range(len(seq_record.seq) - 99) :
f.write(str(">"+seq_record.id) + "\n")
f.write(str(seq_record.seq[i:i+100]) + "\n")
fasta文件示例:
>hg17_ct_ER_ER_142
CTAAAAAAGTAAAAAAGAAAAAAAGAGAAAGAAAGAATATAGAAGCAACAAGTGTAGATTTACATTCTATTAGACAGTGACCCATTAGACCCGGACAAGGGG
示例输出:
>hg17_ct_ER_ER_142
CTAAAAAAGTAAAAAAGAAAAAAAGAGAAAGAAAGAATATAGAAGCAACAAGTGTAGATTTACATTCTATTAGACAGTGACCCATTAGACCCGGACAAGG
>hg17_ct_ER_ER_142
TAAAAAAGTAAAAAAGAAAAAAAGAGAAAGAAAGAATATAGAAGCAACAAGTGTAGATTTACATTCTATTAGACAGTGACCCATTAGACCCGGACAAGGG
>hg17_ct_ER_ER_142
AAAAAAGTAAAAAAGAAAAAAAGAGAAAGAAAGAATATAGAAGCAACAAGTGTAGATTTACATTCTATTAGACAGTGACCCATTAGACCCGGACAAGGGG
预期产出:
>hg17_ct_ER_ER_142
CTAAAAAAGTAAAAAAGAAAAAAAGAGAAAGAAAGAATATAGAAGCAACA
>hg17_ct_ER_ER_142
AGTGTAGATTTACATTCTATTAGACAGTGACCCATTAGACCCGGACAAGG
答案 0 :(得分:1)
只需使用范围功能的步长选项:
for i in range(0, len(seq_record.seq) - 99, 50) :