我有一个像这样的xml文件:
<car>Ferrari</car>
<color>red</color>
<speed>300</speed>
<car>Porsche</car>
<color>black</color>
<speed>310</speed>
我需要以这种形式获得它:
<car name="Ferrari">
<color>red</color>
<speed>300</speed>
</car>
<car name="Porsche">
<color>black</color>
<speed>310</speed>
</car>
我该怎么做?我正在努力,因为我想不出一种方法来创建我需要的结构,从原始xml文件中的平面标签lis。
我选择的语言是Python,但欢迎提出任何建议。
答案 0 :(得分:7)
XSLT是将一个XML结构转换为另一个XML结构的完美工具。
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<!-- copy the root element and handle its <car> children -->
<xsl:template match="/root">
<xsl:copy>
<xsl:apply-templates select="car" />
<xsl:copy>
</xsl:template>
<!-- car elements become a container for their properties -->
<xsl:template match="car">
<car name="{normalize-space()}">
<!-- ** see 1) -->
<xsl:copy-of select="following-sibling::color[1]" />
<xsl:copy-of select="following-sibling::speed[1]" />
</car>
</xsl:template>
</xsl:stylesheet>
1)为此,您的XML必须为<color>
<speed>
和<car>
<!-- any following-sibling element that "belongs" to the same <car> -->
<xsl:copy-of select="following-sibling::*[
generate-id(preceding-sibling::car[1]) = generate-id(current())
]" />
。如果不能保证,或者数量和类型的属性通常是可变的,请用复制语句的通用形式替换这两行:
<root>
应用于您的XML(我暗示了一个名为<root>
<car name="Ferrari">
<color>red</color>
<speed>300</speed>
</car>
<car name="Porsche">
<color>black</color>
<speed>310</speed>
</car>
</root>
的文档元素),这将是结果
{{1}}
在Python中将XSLT应用于XML的示例代码应该很容易找到,所以我在这里省略。它只需要4到5行Python代码。
答案 1 :(得分:1)
我不知道python,但假设你有一个XML解析器,它给你一个XML文档中节点的层次访问,你想要的语义就像下面这样(警告,我倾向于使用PHP)。基本上,存储任何非“汽车”标签,然后当您遇到新的“汽车”标签时,将其视为分隔字段并创建组装的XML节点:
// Create an input and output handle
input_handle = parse_xml_document();
output_handle = new_xml_document();
// Assuming the <car>, <color> etc. nodes are
// the children of some, get them as an array
list_of_nodes = input_handle.get_list_child_nodes();
// These are empty variables for storing our data as we parse it
var car, color, speed = NULL
foreach(list_of_nodes as node)
{
if(node.tag_name() == "speed")
{
speed = node.value();
// etc for each type of non-delimiting field
}
if(node.tag_name() == "car")
{
// If there's already a car specified, take its data,
// insert it into the output xml structure and th
if(car != NULL)
{
// Add a new child node to the output document
node = output_handle.append_child_node("car");
// Set the attribute on this new output node
node.set_attribute("name", node.value());
// Add the stored child attributes
node.add_child("color", color);
node.add_child("speed", speed);
}
// Replace the value of car afterwards. This allows the
// first iteration to happen when there is no stored value
// for "car".
car = node.value();
}
}
答案 2 :(得分:0)
IF 您的实际数据就像您的示例一样简单,并且没有错误,您可以使用正则表达式替换来实现一次:
import re
guff = """
<car>Ferrari</car>
<color>red</color>
<speed>300</speed>
<car>Porsche</car>
<color>black</color>
<speed>310</speed>
"""
pattern = r"""
<car>([^<]+)</car>\s*
<color>([^<]+)</color>\s*
<speed>([^<]+)</speed>\s*
"""
repl = r"""<car name="\1">
<color>\2</color>
<speed>\3</speed>
</car>
"""
regex = re.compile(pattern, re.VERBOSE)
output = regex.sub(repl, guff)
print output
否则你最好一次读3行,做一些验证,并一次写出一个“car”元素,使用字符串处理或ElementTree。
答案 3 :(得分:0)
假设根中的第一个元素是car
元素,并且所有非car
元素“属于”最后一个car
:
import xml.etree.cElementTree as etree
root = etree.XML('''<root>
<car>Ferrari</car>
<color>red</color>
<speed>300</speed>
<car>Porsche</car>
<color>black</color>
<speed>310</speed>
</root>''')
new_root = etree.Element('root')
for elem in root:
if elem.tag == 'car':
car = etree.SubElement(new_root, 'car', name=elem.text)
else:
car.append(elem)
new_root
将是:
<root><car name="Ferrari"><color>red</color>
<speed>300</speed>
</car><car name="Porsche"><color>black</color>
<speed>310</speed>
</car></root>
(我认为漂亮的空白并不重要)