我正在使用xml.etree.ElementTree.tostring()将etree元素转换为字符串。但有时候我有问题:
xpath = "..."
htmlparser = etree.HTMLParser()
tree = etree.parse(response, htmlparser)
result = tree.xpath(xpath)
xml.etree.ElementTree.tostring(result[0], encoding='utf-8')
错误是:
File "../abc.py", line 165, in abc
results.append(xml.etree.ElementTree.tostring(result[0], encoding='utf-8'))
File "C:\Python27\lib\xml\etree\ElementTree.py", line 1127, in tostring
ElementTree(element).write(file, encoding, method=method)
File "C:\Python27\lib\xml\etree\ElementTree.py", line 818, in write
self._root, encoding, default_namespace
File "C:\Python27\lib\xml\etree\ElementTree.py", line 887, in _namespaces
_raise_serialization_error(tag)
File "C:\Python27\lib\xml\etree\ElementTree.py", line 1053, in _raise_serialization_error
"cannot serialize %r (type %s)" % (text, type(text).__name__)
TypeError: cannot serialize <built-in function Comment> (type builtin_function_or_method)
我该如何解决?
答案 0 :(得分:2)
看起来result[0]
是评论,您可能想要跳过。这样的事情应该做:
etree.HTMLParser(remove_comments=True)
来自docs:
Elementtree在解析XML时忽略注释和处理指令,而etree将读取它们并分别将它们视为Comment或ProcessingInstruction元素。这在文本内容中找到注释时尤其明显,然后由Comment元素分割。
您可以通过将boolean remove_comments和/或remove_pis关键字参数传递给您使用的解析器来禁用此行为。为方便起见并支持可移植代码,您还可以使用etree.ETCompatXMLParser而不是默认的etree.XMLParser。它尝试提供尽可能接近ElementTree解析器的默认设置。