我有一个短序列表,我希望获得它的坐标,或者在与包含原始序列的fasta文件进行比较后获取其床文件。
Fasta文件:
>PGH2
CGTAGCGGCTGAGTGCGCGGATAGCGCGTA
短序列fasta文件:
>PGH2
CGGCTGAGT
有没有办法获得它的坐标?床上用品无济于事。
期望的输出:
PGH2 6 14
答案 0 :(得分:3)
使用BioPyton
from Bio import SeqIO
for long_sequence_record in SeqIO.parse(open('long_sequences.fasta'), 'fasta'):
long_sequence = str(long_sequence_record.seq)
for short_sequence_record in SeqIO.parse(open('short_sequences.fasta'), 'fasta'):
short_sequence = str(short_sequence_record.seq)
if short_sequence in long_sequence:
start = long_sequence.index(short_sequence) + 1
stop = start + len(short_sequence) - 1
print short_sequence_record.id, start, stop
答案 1 :(得分:0)
str1 = "CGTAGCGGCTGAGTGCGCGGATAGCGCGTA"
str2 = "CGGCTGAGT"
index = str1.index(str2)
print index
输出:index = 5,得到6,14使用指数+ 1,索引+ len(str2)