恢复模式下的etree.XMLParser仍然可以引发解析错误吗?

时间:2019-05-22 06:37:09

标签: python unit-testing lxml

我有一个实用程序方法,该方法使用创建为etree.XMLParser(recover=True)的解析器来解析XML。我想在单元测试中测试失败场景。除了抛出lxml.etree.XMLSyntaxError的空输入外,我似乎无法破坏解析器。

我的问题是:是否可以为此解析器构造一个StringIOBytesIO输入,以使该解析器抛出解析错误?

以下是一些示例(已通过Python 3.5和lxml 4.3.3测试):

from io import BytesIO
from lxml import etree


def parse(xml):
    parser = etree.XMLParser(recover=True)
    elem = etree.parse(BytesIO(xml), parser)
    print(etree.tostring(elem))


parse(b'<broken<')  # prints b'<broken/>'
parse(b'</lf|\jf>')  # prints None
parse('<?xml encoding="ascii"?><foo>æøå</foo>'.encode('utf-8'))  # prints b'<foo/>'
parse(b'')  # Throws lxml.etree.XMLSyntaxError

2 个答案:

答案 0 :(得分:0)

如果我在您发现的任何错误输入的开头打了一个NULL字符,则不会引发错误,但我确实收到了错误。例如:

parse(b'\0<broken<')

产生:

Traceback (most recent call last):
  File "test.py", line 13, in <module>
    parse(b'\0<broken<')  # prints b'<broken/>'
  File "test.py", line 9, in parse
    elem = etree.parse(BytesIO(xml), parser)
  File "src/lxml/etree.pyx", line 3435, in lxml.etree.parse
  File "src/lxml/parser.pxi", line 1857, in lxml.etree._parseDocument
  File "src/lxml/parser.pxi", line 1877, in lxml.etree._parseMemoryDocument
  File "src/lxml/parser.pxi", line 1765, in lxml.etree._parseDoc
  File "src/lxml/parser.pxi", line 1127, in lxml.etree._BaseParser._parseDoc
  File "src/lxml/parser.pxi", line 601, in lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 711, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 640, in lxml.etree._raiseParseError
  File "<string>", line 1
lxml.etree.XMLSyntaxError: Document is empty, line 1, column 1

答案 1 :(得分:0)

不是因为您使用的是restore = True吗?

  

恢复-尝试解析破损的XML

我更改了restore = False,我得到了:

Traceback (most recent call last):
  File "./foo.py", line 11, in <module>
    parse(b'<broken<')  # prints b'<broken/>'
  File "./foo.py", line 7, in parse
    elem = etree.parse(BytesIO(xml), parser)
  File "src/lxml/etree.pyx", line 3435, in lxml.etree.parse
  File "src/lxml/parser.pxi", line 1857, in lxml.etree._parseDocument
  File "src/lxml/parser.pxi", line 1877, in lxml.etree._parseMemoryDocument
  File "src/lxml/parser.pxi", line 1765, in lxml.etree._parseDoc
  File "src/lxml/parser.pxi", line 1127, in lxml.etree._BaseParser._parseDoc
  File "src/lxml/parser.pxi", line 601, in lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 711, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 640, in lxml.etree._raiseParseError
  File "<string>", line 1
lxml.etree.XMLSyntaxError: error parsing attribute name, line 1, column 8

我想念什么吗?