Question

我正在使用xml.etree.ElementTree.tostring（）将etree元素转换为字符串。但有时候我有问题：

xpath = "..."
htmlparser = etree.HTMLParser()
tree = etree.parse(response, htmlparser)
result = tree.xpath(xpath)
xml.etree.ElementTree.tostring(result[0], encoding='utf-8')

错误是：

File "../abc.py", line 165, in abc
    results.append(xml.etree.ElementTree.tostring(result[0], encoding='utf-8'))
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 1127, in tostring
    ElementTree(element).write(file, encoding, method=method)
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 818, in write
    self._root, encoding, default_namespace
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 887, in _namespaces
    _raise_serialization_error(tag)
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 1053, in _raise_serialization_error
    "cannot serialize %r (type %s)" % (text, type(text).__name__)
TypeError: cannot serialize <built-in function Comment> (type builtin_function_or_method)

我该如何解决？

Answer 1

看起来result[0]是评论，您可能想要跳过。这样的事情应该做：

etree.HTMLParser(remove_comments=True)

来自docs：

Elementtree在解析XML时忽略注释和处理指令，而etree将读取它们并分别将它们视为Comment或ProcessingInstruction元素。这在文本内容中找到注释时尤其明显，然后由Comment元素分割。

您可以通过将boolean remove_comments和/或remove_pis关键字参数传递给您使用的解析器来禁用此行为。为方便起见并支持可移植代码，您还可以使用etree.ETCompatXMLParser而不是默认的etree.XMLParser。它尝试提供尽可能接近ElementTree解析器的默认设置。

Python：xml.etree.ElementTree.tostring错误

1 个答案: