如何根据步长提取短序列?

时间:2015-06-12 06:03:57

标签: python bioinformatics extraction biopython fasta

下面的代码在窗口大小为100的每个序列中提取短序列。窗口将按步长1移动并提取序列。我想提取每个步长50的短序列。任何人都可以帮助我吗?

 from Bio import SeqIO

 with open("B.fasta","w") as f:
         for seq_record in SeqIO.parse("A.fasta", "fasta"):
             for i in range(len(seq_record.seq) - 99) :
                f.write(str(">"+seq_record.id) + "\n")
                f.write(str(seq_record.seq[i:i+100]) + "\n")

fasta文件示例:

>hg17_ct_ER_ER_142
CTAAAAAAGTAAAAAAGAAAAAAAGAGAAAGAAAGAATATAGAAGCAACAAGTGTAGATTTACATTCTATTAGACAGTGACCCATTAGACCCGGACAAGGGG

示例输出:

>hg17_ct_ER_ER_142
CTAAAAAAGTAAAAAAGAAAAAAAGAGAAAGAAAGAATATAGAAGCAACAAGTGTAGATTTACATTCTATTAGACAGTGACCCATTAGACCCGGACAAGG
>hg17_ct_ER_ER_142
TAAAAAAGTAAAAAAGAAAAAAAGAGAAAGAAAGAATATAGAAGCAACAAGTGTAGATTTACATTCTATTAGACAGTGACCCATTAGACCCGGACAAGGG
>hg17_ct_ER_ER_142
AAAAAAGTAAAAAAGAAAAAAAGAGAAAGAAAGAATATAGAAGCAACAAGTGTAGATTTACATTCTATTAGACAGTGACCCATTAGACCCGGACAAGGGG

预期产出:

>hg17_ct_ER_ER_142
CTAAAAAAGTAAAAAAGAAAAAAAGAGAAAGAAAGAATATAGAAGCAACA
>hg17_ct_ER_ER_142
AGTGTAGATTTACATTCTATTAGACAGTGACCCATTAGACCCGGACAAGG

1 个答案:

答案 0 :(得分:1)

只需使用范围功能的步长选项:

for i in range(0, len(seq_record.seq) - 99, 50) :