我开始使用biopython,我对解析结果有疑问。我使用了tutorial来参与其中,这里是我使用的代码:
from Bio.Blast import NCBIXML
for record in NCBIXML.parse(open("/Users/jcastrof/blast/pruebarpsb.xml")):
if record.alignments:
print "Query: %s..." % record.query[:60]
for align in record.alignments:
for hsp in align.hsps:
print " %s HSP,e=%f, from position %i to %i" \
% (align.hit_id, hsp.expect, hsp.query_start, hsp.query_end)
获得的部分结果是:
gnl|CDD|225858 HSP,e=0.000000, from position 32 to 1118
gnl|CDD|225858 HSP,e=0.000000, from position 1775 to 2671
gnl|CDD|214836 HSP,e=0.000000, from position 37 to 458
gnl|CDD|214836 HSP,e=0.000000, from position 1775 to 2192
gnl|CDD|214838 HSP,e=0.000000, from position 567 to 850
我想要做的是按命中的位置(Hsp_hit-from)对结果进行排序,如下所示:
gnl|CDD|225858 HSP,e=0.000000, from position 32 to 1118
gnl|CDD|214836 HSP,e=0.000000, from position 37 to 458
gnl|CDD|214838 HSP,e=0.000000, from position 567 to 850
gnl|CDD|225858 HSP,e=0.000000, from position 1775 to 2671
gnl|CDD|214836 HSP,e=0.000000, from position 1775 to 2192
我的rps-blast输入文件是一个* .xml文件。 有什么建议继续吗?
谢谢!
答案 0 :(得分:2)
HSP列表只是一个Python列表,可以照常排序。尝试:
align.hsps.sort(key = lambda hsp: hsp.query_start)
但是,您正在处理嵌套列表(每个匹配都有一个HSP列表),并且您希望对所有这些列表进行排序。在这里制作你自己的清单可能是最好的 - 像这样:
for record in ...:
print "Query: %s..." % record.query[:60]
hits = sorted((hsp.query_start, hsp.query_end, hsp.expect, align.hit_id) \
for hsp in align.hsps for align in record.alignments)
for q_start, q_end, expect, hit_id in hits:
print " %s HSP,e=%f, from position %i to %i" \
% (hit_id, expect, q_start, q_end)
彼得