根据其位置连接爆炸命中子集以获得完整的命中

时间:2014-02-01 10:36:38

标签: python xml biopython blast

我正在使用biopython做类似的事情, Sort rps-blast results by position of the hit但希望加入或连接本地匹配,以获得连续的查询和主题点击。

我的代码:

for record in records:
   for alignment in record.alignments:
                hits = sorted((hsp.query_start, hsp.query_end, hsp.sbjct_start, hsp.sbjct_end, alignment.title, hsp.query, hsp.sbjct)\
                               for hsp in alignment.hsps)
                for q_start, q_end, sb_start, sb_end, title, query, sbjct in hits:
                      print title
                      print 'The query starts from position: ' + str(q_start)
                      print 'The query ends at position: ' + str(q_end)
                      print 'The hit starts at position: ' + str(sb_start)
                      print 'The hit ends at position: ' + str(sb_end)
                      print 'The  query is: ' + query
                      print 'The hit is: ' + sbjct

这将给出排序结果:

Species_1
The query starts from position: 1
The query ends at position: 184
The hit starts at position: 1
The hit ends at position: 552
The query is: #######query_seq
The hit is: ######### hit_seq
Species_1
The query starts from position: 390
The query ends at position: 510
The hit starts at position: 549
The hit ends at position: 911
The query is: #######query_seq
The hit is: ######### hit_seq
Species_1
The query starts from position: 492
The query ends at position: 787
The hit starts at position: 889
The hit ends at position: 1776
The query is: #######query_seq
The hit is: ######### hit_seq

这一切都很好,但我想进入下一个逻辑步骤,即连接此处显示的所有三个子查询和子命中(命中数确实变化)以获得完整的查询和主题序列。前进的方向是什么?

1 个答案:

答案 0 :(得分:0)

Okk,所以我给你一个示例解决方案。希望,这会有所帮助!

您可以在循环外创建一个空变量,并将查询字符串连接到该变量。这是对您给定代码的编辑:

expected_query_seq = ""
for record in records:
   for alignment in record.alignments:
                hits = sorted((hsp.query_start, hsp.query_end, hsp.sbjct_start, hsp.sbjct_end, alignment.title, hsp.query, hsp.sbjct)\
                               for hsp in alignment.hsps)
                for q_start, q_end, sb_start, sb_end, title, query, sbjct in hits:
                      print title
                      print 'The query starts from position: ' + str(q_start)
                      print 'The query ends at position: ' + str(q_end)
                      print 'The hit starts at position: ' + str(sb_start)
                      print 'The hit ends at position: ' + str(sb_end)
                      print 'The  query is: ' + query
                      print 'The hit is: ' + sbjct

                      expected_query_seq += str(query[q_start:q_end])
print expected_query_seq