使用lxml注释掉一个元素

时间:2017-06-07 14:53:50

标签: python lxml

是否可以使用python' lxml注释掉xml元素,同时保留注释中的原始元素呈现?我尝试了以下

elem.getparent().replace(elem, etree.Comment(etree.tostring(elem, pretty_print=True)))

但是tostring()添加了名称空间声明。

1 个答案:

答案 0 :(得分:2)

The namespace of the commented-out element is inherited from the root element. Demo:

from lxml import etree

XML = """
<root xmlns='foo'>
 <a>
  <b>AAA</b>
 </a>
</root>"""

root = etree.fromstring(XML)
b = root.find(".//{foo}b")
b.getparent().replace(b, etree.Comment(etree.tostring(b)))
print etree.tostring(root)

Result:

<root xmlns="foo">
 <a>
  <!--<b xmlns="foo">AAA</b>
 --></a>
</root>

Manipulating namespaces is often harder than you might suspect. See https://stackoverflow.com/a/31870245/407651.

My suggestion here is to use BeautifulSoup, which in practice does not really care about namespaces (soup.find('b') returns the b element even though it is in the foo namespace).

from bs4 import BeautifulSoup, Comment

soup = BeautifulSoup(XML, "xml")
b = soup.find('b')
b.replace_with(Comment(str(b)))
print soup.prettify()

Result:

<?xml version="1.0" encoding="utf-8"?>
<root mlns="foo">
 <a>
  <!--<b>AAA</b>-->
 </a>
</root>