python如何知道xml

时间:2015-08-28 13:05:14

标签: python xml lxml

我有一个xml,我验证它是否真的是一个很好的格式化xml,如下所示:

try:
            self.doc=etree.parse(attributesXMLFilePath)
        except IOError:
            error_message = "Error: Couldn't find attribute XML file path {0}".format(attributesXMLFilePath)
            raise XMLFileNotFoundException(error_message)
        except XMLSyntaxError:
            error_message = "The file {0} is not a good XML file, recheck please".format(attributesXMLFilePath)
            raise NotGoodXMLFormatException(error_message)

如您所见,我正在捕获XMLSyntaxError,这是一个错误:

from lxml.etree import XMLSyntaxError

效果很好,但只是告诉我文件是不是一个好的xml格式。但是,我想问你们,是否有办法知道哪个标签是错误的,因为在我这样做的情况下:

<name>Marco</name1>

我收到错误,有没有办法知道name标记还没有关闭?

更新

在一些人给我线路和位置的想法之后,我想出了这个代码:

    class XMLFileNotFoundException(GeneralSpiderException):
        def __init__(self, message):
            super(XMLFileNotFoundException, self).__init__(message, self)

class GeneralSpiderException(Exception):
    def __init__(self, message, e):
        super(GeneralSpiderException, self).__init__(message+" \nline of Exception = {0}, position of Exception = {1}".format(e.lineno, e.position))

我仍然像这样提出错误

raise XMLFileNotFoundException(error_message)

我现在收到此错误

    super(GeneralSpiderException, self).__init__(message+" \nline of Exception = {0}, position of Exception = {1}".format(e.lineno, e.position))
exceptions.AttributeError: 'XMLFileNotFoundException' object has no attribute 'lineno'

2 个答案:

答案 0 :(得分:2)

您可以打印错误的详细信息。例如:

try:
    self.doc = etree.parse(attributesXMLFilePath)
except XMLSyntaxError as e:
    error_message = "The file {0} is not correct XML, {1}".format(attributesXMLFilePath, e.msg)
    raise NotGoodXMLFormatException(error_message)

答案 1 :(得分:2)

这可能不是您想要的,但您可以从异常中获取检测到错误的确切行和列:

import lxml.etree
import StringIO
xml_fragment = "<name>Marco</name1>"
#               12345678901234
try:
    lxml.etree.parse(StringIO.StringIO(xml_fragment))
except lxml.etree.XMLSyntaxError as exc:
    line, column = exc.position

在此示例中,linecolumn将为1和14,表示结束标记的第一个字符没有匹配的开始标记。