在“for”循环中捕获异常的位置?

时间:2014-10-16 15:52:08

标签: python xml elementtree

我有一些用于解析XML的代码,我想稍微改进一下(主要是为了解决格式错误的XML文件)。

try:
    import xml.etree.cElementTree as ET
except:
    import xml.etree.ElementTree as ET

context = ET.iterparse("myfile.xml", events=("start", "end"))
context = iter(context)
event, root = context.next()
for event, elem in context:
    if event == 'start' and elem.tag == "hello":
        print("start report")

使用这个有效的XML可以正常工作:

<?xml version="1.0" ?>
<Report name="TEST" xmlns:cm="http://www.nessus.org/cm">
<hello>world</hello>
</Report>

如果我通过删除最后一个标记使XML失效,我会得到一个SyntaxError异常,这就是我想要处理的无效XML

来自跑步的追溯:

Traceback (most recent call last):
  File "/tmp/GetNessusScans/parsereport.py", line 12, in <module>
    for event, elem in context:
  File "<string>", line 68, in __iter__
SyntaxError: no element found: line 4, column 0

我的问题是:我应该在哪里放置try:以捕获此异常?

我需要线性解析文件,因为它的大小和我的理解是for循环最终到达一个缺少标记(或不匹配的标记)的点。我尝试了一些异国情调的代码(except子句实际上会做一些有用的事情,这只是一个测试):

try:
    import xml.etree.cElementTree as ET
except:
    import xml.etree.ElementTree as ET

context = ET.iterparse("myfile.xml", events=("start", "end"))
context = iter(context)
event, root = context.next()
try:
    for event, elem in context:
except SyntaxError:
    print("invalid XML")
else:
    # if we hit the description of the scan, save it
    if event == 'start' and elem.tag == "hello":
        print("start report")

但我怀疑它不正确。

1 个答案:

答案 0 :(得分:1)

您可以将整个解析块放在一个新的try - except块中。理由是,任何标记都可能在XML中被破坏,而不仅仅是最后一个标记,因此错误可能发生在解析的任何地方。

try:
    import xml.etree.cElementTree as ET
except ImportError:
    import xml.etree.ElementTree as ET
try:
    context = ET.iterparse("myfile.xml", events=("start", "end"))
    context = iter(context)
    event, root = context.next()
    for event, elem in context:
        if event == 'start' and elem.tag == "hello":
            print("start report")
except SyntaxError, ParseError as exc:
    pass
except Exception
    pass