我有一个xml,我验证它是否真的是一个很好的格式化xml,如下所示:
try:
self.doc=etree.parse(attributesXMLFilePath)
except IOError:
error_message = "Error: Couldn't find attribute XML file path {0}".format(attributesXMLFilePath)
raise XMLFileNotFoundException(error_message)
except XMLSyntaxError:
error_message = "The file {0} is not a good XML file, recheck please".format(attributesXMLFilePath)
raise NotGoodXMLFormatException(error_message)
如您所见,我正在捕获XMLSyntaxError,这是一个错误:
from lxml.etree import XMLSyntaxError
效果很好,但只是告诉我文件是不是一个好的xml格式。但是,我想问你们,是否有办法知道哪个标签是错误的,因为在我这样做的情况下:
<name>Marco</name1>
我收到错误,有没有办法知道name
标记还没有关闭?
在一些人给我线路和位置的想法之后,我想出了这个代码:
class XMLFileNotFoundException(GeneralSpiderException):
def __init__(self, message):
super(XMLFileNotFoundException, self).__init__(message, self)
class GeneralSpiderException(Exception):
def __init__(self, message, e):
super(GeneralSpiderException, self).__init__(message+" \nline of Exception = {0}, position of Exception = {1}".format(e.lineno, e.position))
我仍然像这样提出错误
raise XMLFileNotFoundException(error_message)
我现在收到此错误
super(GeneralSpiderException, self).__init__(message+" \nline of Exception = {0}, position of Exception = {1}".format(e.lineno, e.position))
exceptions.AttributeError: 'XMLFileNotFoundException' object has no attribute 'lineno'
答案 0 :(得分:2)
您可以打印错误的详细信息。例如:
try:
self.doc = etree.parse(attributesXMLFilePath)
except XMLSyntaxError as e:
error_message = "The file {0} is not correct XML, {1}".format(attributesXMLFilePath, e.msg)
raise NotGoodXMLFormatException(error_message)
答案 1 :(得分:2)
这可能不是您想要的,但您可以从异常中获取检测到错误的确切行和列:
import lxml.etree
import StringIO
xml_fragment = "<name>Marco</name1>"
# 12345678901234
try:
lxml.etree.parse(StringIO.StringIO(xml_fragment))
except lxml.etree.XMLSyntaxError as exc:
line, column = exc.position
在此示例中,line
和column
将为1和14,表示结束标记的第一个字符没有匹配的开始标记。