我在标签名称中有一个不太完整的带有连字符的XML,我想用下划线替换(为了能够使用lxml.objectify)。我想替换所有标记名称,包括嵌套的子项。
示例XML:
<job>
<server>
<cpu-set>
</cpu-set>
</server>
<ip-routings>
</ip-routings>
</job>
我想以一种干净的方式(没有正则表达式但是使用像lxml这样的XML库)将这个XML转换为这个:
<job>
<server>
<cpu_set>
</cpu_set>
</server>
<ip_routings>
</ip_routings>
</job>
这样做是什么样的pythonic和干净的方式?
答案 0 :(得分:4)
使用xpath查找带连字符的元素并重写标记:
from lxml import etree
data = """<job>
<server>
<cpu-set>
</cpu-set>
</server>
<ip-routings>
</ip-routings>
</job>"""
doc = etree.XML(data)
for e in doc.xpath('//*[contains(local-name(),"-")]'):
e.tag = e.tag.replace('-','_')
print etree.tostring(doc)
收率:
<job>
<server>
<cpu_set>
</cpu_set>
</server>
<ip_routings>
</ip_routings>
</job>
答案 1 :(得分:1)
我知道这不是python,但它对我来说是pythonic:C# with the csharp interpreter from mono:
using System.Xml.Linq;
var doc = XDocument.Load(Console.In);
foreach(var node in doc.Descendants().Reverse())
node.ReplaceWith(new XElement(
node.Name.Namespace + node.Name.LocalName.Replace("-","_"),
node.Attributes(),
node.Nodes()));
doc.Save(Console.Out);
这具有以下重要属性,如果不依赖于现有XML库,则很难做到:
input.xml中:
<?xml version="1.0"?>
<job xmlns:ex="test">
<server attr1="first" attr2="second">
<ex:cpu-set>
</ex:cpu-set>
</server>
<ip-routings>
contained <mixed/>text
</ip-routings>
</job>
csharp -r:System.Xml.Linq test < input.xml
的输出:
<?xml version="1.0" encoding="utf-8"?>
<job xmlns:ex="test">
<server attr1="first" attr2="second">
<ex:cpu_set />
</server>
<ip_routings>
contained <mixed />text
</ip_routings>
</job>