让lxml不要创建自我结束标签

时间:2017-01-27 09:15:10

标签: python lxml

我有一个(旧的)工具,它不理解像<STATUS/>这样的自动关闭标签。因此,我们需要使用打开/关闭的标记序列化我们的XML文件,如下所示:<STATUS></STATUS>

目前我有:

>>> from lxml import etree

>>> para = """<ERROR>The status is <STATUS></STATUS>.</ERROR>"""
>>> tree = etree.XML(para)
>>> etree.tostring(tree)
'<ERROR>The status is <STATUS/>.</ERROR>'

如何使用打开/关闭的标签进行序列化?

<ERROR>The status is <STATUS></STATUS>.</ERROR>

解决方案

wildwilhelmbelow

给出
>>> from lxml import etree

>>> para = """<ERROR>The status is <STATUS></STATUS>.</ERROR>"""
>>> tree = etree.XML(para)
>>> for status_elem in tree.xpath("//STATUS[string() = '']"):
...     status_elem.text = ""
>>> etree.tostring(tree)
'<ERROR>The status is <STATUS></STATUS>.</ERROR>'

2 个答案:

答案 0 :(得分:5)

似乎为<STATUS>标记分配了text None属性:

>>> tree[0]
<Element STATUS at 0x11708d4d0>
>>> tree[0].text
>>> tree[0].text is None
True

如果您将text标记的<STATUS>属性设置为空字符串,则应该获得您正在寻找的内容:

>>> tree[0].text = ''
>>> etree.tostring(tree)
'<ERROR>The status is <STATUS></STATUS>.</ERROR>'

考虑到这一点,你可以在写出XML之前走一个DOM树并修复text属性。像这样:

# prevent creation of self-closing tags
for node in tree.iter():
    if node.text is None:
        node.text = ''

答案 1 :(得分:3)

如果将lxml dom设置为HTML,则可以使用

etree.tostring(html_dom, method='html')

防止自闭标签(如<a />