我遇到了一个非常严重的问题。这是代码。
#!/usr/bin/python
# filename: parse_dblp.py
# author: ivanchou
import codecs, os
import xml.etree.ElementTree as ET
paper_tag = ('article','inproceedings','proceedings','book',
'incollection','phdthesis','mastersthesis','www')
class AllEntities:
def __getitem__(self, key):
return key
print ('----------parse begin----------')
# the parse result store to authors
result = codecs.open('authors','w','utf-8')
parser = ET.XMLParser()
parser.parser.UseForeignDTD(True)
parser.entity = AllEntities()
for event, article in ET.iterparse('dblp_part.xml', events=("start",
"end"), parser=parser):
for author in article.findall('author'):
result.write(author.text + u'|')
if event == 'end' and article.tag in paper_tag:
result.write(os.linesep)
article.clear()
print ('----------parse end----------')
文件dblp_part.xml我在这里创建了一个要点: dblp_part.xml
它包含dblp.xml的头部2336行,以及它返回的最后一篇文章元素NoneType错误,如果我交换最后两个元素,则一切正常。那么这是ElementTree的错误吗?
我是python的新手,寻求帮助。