Python:如何在FASTA文件中查找短序列的坐标?

时间:2015-05-21 02:55:01

标签: python python-2.7 bioinformatics biopython fasta

我有一个短序列表,我希望获得它的坐标,或者在与包含原始序列的fasta文件进行比较后获取其床文件。

Fasta文件:

>PGH2
CGTAGCGGCTGAGTGCGCGGATAGCGCGTA

短序列fasta文件:

>PGH2
CGGCTGAGT

有没有办法获得它的坐标?床上用品无济于事。

期望的输出:

PGH2  6 14

2 个答案:

答案 0 :(得分:3)

使用BioPyton

from Bio import SeqIO

for long_sequence_record in SeqIO.parse(open('long_sequences.fasta'), 'fasta'):
    long_sequence = str(long_sequence_record.seq)

    for short_sequence_record in SeqIO.parse(open('short_sequences.fasta'), 'fasta'):
        short_sequence = str(short_sequence_record.seq)

        if short_sequence in long_sequence:
            start = long_sequence.index(short_sequence) + 1
            stop = start + len(short_sequence) - 1
            print short_sequence_record.id, start, stop

答案 1 :(得分:0)

str1 = "CGTAGCGGCTGAGTGCGCGGATAGCGCGTA"
str2 = "CGGCTGAGT"
index = str1.index(str2)
print index

输出:index = 5,得到6,14使用指数+ 1,索引+ len(str2)