我正在尝试阅读通过多个序列提交生成的XML文件列表到NCBI爆炸网站。从每个文件,我想打印某些信息行。
我想要阅读的文件都给出了后缀"_recombination.xml"
。
for file in glob.glob("*_recombination.xml"):
result_handle= open(file)
blast_record=NCBIXML.read(result_handle)
for alignment in blast_record.alignments:
for hsp in alignment.hsps:
print "*****Alignment****"
print "sequence:", alignment.title
print "length:", alignment.length
print "e-value:", hsp.expect
print hsp.query
print hsp.match
print hsp.sbjct
该脚本首先找到所有带有"_recombination.xml"
后缀的文件,然后我希望它读取每个文件,然后打印某些行(这几乎是BioPython烹饪书中的直接副本),它似乎去做。但是我收到以下错误:
Traceback (most recent call last):
File "Scripts/blast_test.py", line 202, in <module>
blast_record=NCBIXML.read(result_handle)
File "/Library/Python/2.7/site-packages/Bio/Blast/NCBIXML.py", line 576, in read
first = iterator.next()
File "/Library/Python/2.7/site-packages/Bio/Blast/NCBIXML.py", line 643, in parse
expat_parser.Parse("", True) # End of XML record
xml.parsers.expat.ExpatError: no element found: line 3106, column 7594
我不确定问题是什么。我不确定它是否试图循环回读它已经读过的文件 - 例如,关闭文件似乎有帮助:
for file in glob.glob("*_recombination.xml"):
result_handle= open(file)
blast_record=NCBIXML.read(result_handle)
for alignment in blast_record.alignments:
for hsp in alignment.hsps:
print "*****Alignment****"
print "sequence:", alignment.title
print "length:", alignment.length
print "e-value:", hsp.expect
print hsp.query
print hsp.match
print hsp.sbjct
result_handle.close()
blast_record.close()
但它也给了我另一个错误:
Traceback (most recent call last):
File "Scripts/blast_test.py", line 213, in <module> blast_record.close()
AttributeError: 'Blast' object has no attribute 'close'
答案 0 :(得分:2)
我通常使用解析方法而不是读取,也许它可以帮助您:
for blast_record in NCBIXML.parse(open(input_xml)):
for alignment in blast_record.alignments:
for hsp in alignment.hsps:
print "*****Alignment****"
print "sequence:", alignment.title
print "length:", alignment.length
print "e-value:", hsp.expect
print hsp.query
print hsp.match
print hsp.sbjct
并确保在查询文件中使用 -outfmt 5 生成xml
答案 1 :(得分:0)
我会对Biogeek的答案添加评论,但我不能(还没有足够的声誉)。他是对的,你应该使用
NCBIXML.parse(open(input_xml))
而不是NCBIXML.read(open(input_xml)),因为您“正在尝试读取XML文件列表”,而对于XML文件列表,您需要解析而不是读取即可。它解决了你的问题吗?