Question

给出以下xml：

<node a='1' b='1'>
   <subnode x='25'/>
</node>

我想提取第一个节点的标记名和所有属性，即逐字代码：

<node a='1' b='1'>

没有子节点。

例如在Python中，tostring返回太多：

from lxml import etree
root = etree.fromstring("<node a='1' b='1'><subnode x='25'>some text</subnode></node>")
print(etree.tostring(root))

返回

b'<node a="1" b="1"><subnode x="25">some text</subnode></node>'

以下给出了所需的结果，但是过于冗长：

tag = root.tag
for att, val in root.attrib.items():
    tag += ' '+att+'="'+val+'"'
tag = '<'+tag+'>'
print(tag)

结果：

<node a="1" b="1">

这样做更简单（并保证保存属性顺序）的方法是什么？

Answer 1

您可以删除所有子节点。

from lxml import etree

root = etree.fromstring("<node a='1' b='1'><subnode x='25'>some text</subnode></node>")
for subnode in root.xpath("//subnode"):
    subnode.getparent().remove(subnode)

etree.tostring(root)  # '<node a="1" b="1"/>'

或者，您可以使用简单的正则表达式。订单有保证。

import re
res = re.search('<(.*?)>', etree.tostring(root))
res.group(1)  # "node a='1' b='1'"

lxml非递归完整标记

1 个答案: