从ElementTree获取更好的解析错误消息

时间:2015-01-05 12:17:02

标签: python xml xml-parsing

如果我尝试解析损坏的XML,则异常显示行号。有没有办法显示XML上下文?

我想在破损的部分之前和之后看到xml标签。

示例:

import xml.etree.ElementTree as ET
tree = ET.fromstring('<a><b></a>')

例外:

Traceback (most recent call last):
  File "tmp/foo.py", line 2, in <module>
    tree = ET.fromstring('<a><b></a>')
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1300, in XML
    parser.feed(text)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1642, in feed
    self._raiseerror(v)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror
    raise err
xml.etree.ElementTree.ParseError: mismatched tag: line 1, column 8

这样的事情会很好:

ParseError:
<a><b></a>
=====^

2 个答案:

答案 0 :(得分:14)

您可以执行辅助功能:

import sys
import io
import itertools as IT
import xml.etree.ElementTree as ET
PY2 = sys.version_info[0] == 2
StringIO = io.BytesIO if PY2 else io.StringIO

def myfromstring(content):
    try:
        tree = ET.fromstring(content)
    except ET.ParseError as err:
        lineno, column = err.position
        line = next(IT.islice(StringIO(content), lineno))
        caret = '{:=>{}}'.format('^', column)
        err.msg = '{}\n{}\n{}'.format(err, line, caret)
        raise 
    return tree

myfromstring('<a><b></a>')

产量

xml.etree.ElementTree.ParseError: mismatched tag: line 1, column 8
<a><b></a>
=======^

答案 1 :(得分:2)

这不是最佳选择,但它既简单又简单,您只需解析ParseError即可 提取行和列,然后使用它来显示问题所在。

import xml.etree.ElementTree as ET
from xml.etree.ElementTree import ParseError
my_string = '<a><b><c></b></a>'
try:
    tree = ET.fromstring(my_string)
except ParseError as e:
    formatted_e = str(e)
    line = int(formatted_e[formatted_e.find("line ") + 5: formatted_e.find(",")])
    column = int(formatted_e[formatted_e.find("column ") + 7:])
    split_str = my_string.split("\n")
    print "{}\n{}^".format(split_str[line - 1], len(split_str[line - 1][0:column])*"-")

注意:\n仅用于您需要以正确方式拆分的示例。