如何使用lxml插入具有正确名称空间前缀的属性

时间:2019-11-13 08:32:20

标签: python xml lxml xml-namespaces

是否可以使用lxml插入具有正确名称空间的XML属性?

例如,我想使用XLink在XML文档中插入链接。我需要做的就是在某些元素中插入{http://www.w3.org/1999/xlink}href属性。我想使用xlink前缀,但是lxml会生成诸如“ ns0”,“ ns1”等前缀。

这是我尝试过的:

from lxml import etree

#: Name (and namespace) of the *href* attribute use to insert links.
HREF_ATTR = etree.QName("http://www.w3.org/1999/xlink", "href").text

content = """\
<body>
<p>Link to <span>StackOverflow</span></p>
<p>Link to <span>Google</span></p>
</body>
"""

targets = ["https://stackoverflow.com", "https://www.google.fr"]
body_elem = etree.XML(content)
for span_elem, target in zip(body_elem.iter("span"), targets):
    span_elem.attrib[HREF_ATTR] = target

etree.dump(body_elem)

转储如下:

<body>
<p>link to <span xmlns:ns0="http://www.w3.org/1999/xlink"
                 ns0:href="https://stackoverflow.com">stackoverflow</span></p>
<p>link to <span xmlns:ns1="http://www.w3.org/1999/xlink"
                 ns1:href="https://www.google.fr">google</span></p>
</body>

我找到了一种通过在根元素中插入和删除属性来分解名称空间的方法,如下所示:

# trick to declare the XLink namespace globally (only one time).
body_elem = etree.XML(content)
body_elem.attrib[HREF_ATTR] = ""
del body_elem.attrib[HREF_ATTR]

targets = ["https://stackoverflow.com", "https://www.google.fr"]
for span_elem, target in zip(body_elem.iter("span"), targets):
    span_elem.attrib[HREF_ATTR] = target

etree.dump(body_elem)

它很丑,但是它可以工作,我只需要做一次。我得到:

<body xmlns:ns0="http://www.w3.org/1999/xlink">
<p>Link to <span ns0:href="https://stackoverflow.com">StackOverflow</span></p>
<p>Link to <span ns0:href="https://www.google.fr">Google</span></p>
</body>

但是问题仍然存在:如何将这个“ ns0”前缀转换为“ xlink”?

1 个答案:

答案 0 :(得分:1)

按照@mzjn的建议使用register_namespace

etree.register_namespace("xlink", "http://www.w3.org/1999/xlink")

# trick to declare the XLink namespace globally (only one time).
body_elem = etree.XML(content)
body_elem.attrib[HREF_ATTR] = ""
del body_elem.attrib[HREF_ATTR]

targets = ["https://stackoverflow.com", "https://www.google.fr"]
for span_elem, target in zip(body_elem.iter("span"), targets):
    span_elem.attrib[HREF_ATTR] = target

etree.dump(body_elem)

结果是我所期望的:

<body xmlns:xlink="http://www.w3.org/1999/xlink">
<p>Link to <span xlink:href="https://stackoverflow.com">StackOverflow</span></p>
<p>Link to <span xlink:href="https://www.google.fr">Google</span></p>
</body>