更紧凑的ElementTree或lxml命名空间

时间:2013-04-12 20:20:20

标签: python xml lxml elementtree

当子元素作为父元素位于不同的命名空间时,我试图在ElementTree或lxml中获得名称空间的紧凑表示。这是基本的例子:

from lxml import etree

country = etree.Element("country")

name = etree.SubElement(country, "{urn:test}name")
name.text = "Canada"
population = etree.SubElement(country, "{urn:test}population")
population.text = "34M"
etree.register_namespace('tst', 'urn:test')

print( etree.tostring(country, pretty_print=True) )

我也试过这种方法:

ns = {"test" : "urn:test"}

country = etree.Element("country", nsmap=ns)

name = etree.SubElement(country, "{test}name")
name.text = "Canada"
population = etree.SubElement(country, "{test}population")
population.text = "34M"

print( etree.tostring(country, pretty_print=True) )

在这两种情况下,我都会得到这样的结论:

<country>
    <ns0:name xmlns:ns0="urn:test">Canada</ns0:name>
    <ns1:population xmlns:ns1="urn:test">34M</ns1:population>
</country>

虽然这是正确的,但我希望它不那么冗长 - 这可能成为大数据集的真正问题(特别是因为我使用比'urn:test'更大的NS)。

如果我可以将'country'放在“urn:test”命名空间内并声明它(在上面的第一个例子中):

country = etree.Element("{test}country")

然后我得到以下输出:

<ns0:country xmlns:ns0="urn:test">
    <ns0:name>Canada</ns0:name>
    <ns0:population>34M</ns0:population>
</ns0:country>

但我真正想要的是:

<country xmlns:ns0="urn:test">
    <ns0:name>Canada</ns0:name>
    <ns0:population>34M</ns0:population>
<country>

有什么想法吗?

3 个答案:

答案 0 :(得分:2)

  1. 元素的全名包含{namespace-url}elementName,而不是{prefix}elementName

    >>> from lxml import etree as ET
    >>> r = ET.Element('root', nsmap={'tst': 'urn:test'})
    >>> ET.SubElement(r, "{urn:test}child")
    <Element {urn:test}child at 0x2592a80>
    >>> ET.tostring(r)
    '<root xmlns:tst="urn:test"><tst:child/></root>'
    
  2. 在您的情况下,如果更新默认命名空间,则可能更紧凑的表示形式。遗憾的是,lxml似乎不允许空XML命名空间,但是你可以说,你可以将父标记放在与子元素相同的命名空间中,这样你就可以将dafault命名空间设置为子元素的命名空间:

    >>> r = ET.Element('{urn:test}root', nsmap={None: 'urn:test'})
    >>> ET.SubElement(r, "{urn:test}child")
    <Element {urn:test}child at 0x2592b20>
    >>> ET.SubElement(r, "{urn:test}child")
    <Element {urn:test}child at 0x25928f0>
    >>> ET.tostring(r)
    '<root xmlns="urn:test"><child/><child/></root>'
    

答案 1 :(得分:1)

此代码:

from lxml import etree

ns = {"ns0" : "urn:test"}
country = etree.Element("country", nsmap=ns)

name = etree.SubElement(country, "{urn:test}name")
name.text = "Canada"

population = etree.SubElement(country, "{urn:test}population")
population.text = "34M"

print(etree.tostring(country, pretty_print=True))

似乎提供了所需的输出:

<country xmlns:ns0="urn:test">
  <ns0:name>Canada</ns0:name>
  <ns0:population>34M</ns0:population>
</country>

但您仍然需要自己维护nsmap

答案 2 :(得分:1)

from xml.etree import cElementTree as ET
##ET.register_namespace('tst', 'urn:test')
country = ET.Element("country")
name = ET.SubElement(country, "{urn:test}name")
name.text = "Canada"
population = ET.SubElement(country, "{urn:test}population")
population.text = "34M"
print prettify(country)
上面的

将给出(没有注册任何命名空间):

<?xml version="1.0" ?>
<country xmlns:ns0="urn:test">
  <ns0:name>Canada</ns0:name>
  <ns0:population>34M</ns0:population>
</country>

而且,当我删除了注释部分时,它将给出::

<?xml version="1.0" ?>
<country xmlns:tst="urn:test">
  <tst:name>Canada</tst:name>
  <tst:population>34M</tst:population>
</country>

注意:prettify函数为here