在平面xml文件中创建结构

时间:2010-07-05 09:59:00

标签: python xml

我有一个像这样的xml文件:

<car>Ferrari</car>
<color>red</color>
<speed>300</speed>
<car>Porsche</car>
<color>black</color>
<speed>310</speed>

我需要以这种形式获得它:

<car name="Ferrari">
    <color>red</color>
    <speed>300</speed>
</car>
<car name="Porsche">
    <color>black</color>
    <speed>310</speed>
</car>

我该怎么做?我正在努力,因为我想不出一种方法来创建我需要的结构,从原始xml文件中的平面标签lis。

我选择的语言是Python,但欢迎提出任何建议。

4 个答案:

答案 0 :(得分:7)

XSLT是将一个XML结构转换为另一个XML结构的完美工具。

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <!-- copy the root element and handle its <car> children -->
  <xsl:template match="/root">
    <xsl:copy>
      <xsl:apply-templates select="car" />
    <xsl:copy>
  </xsl:template>

  <!-- car elements become a container for their properties -->
  <xsl:template match="car">
    <car name="{normalize-space()}">
      <!-- ** see 1) -->
      <xsl:copy-of select="following-sibling::color[1]" />
      <xsl:copy-of select="following-sibling::speed[1]" />
    </car>
  </xsl:template>
</xsl:stylesheet>

1)为此,您的XML必须为<color> <speed><car> <!-- any following-sibling element that "belongs" to the same <car> --> <xsl:copy-of select="following-sibling::*[ generate-id(preceding-sibling::car[1]) = generate-id(current()) ]" /> 。如果不能保证,或者数量和类型的属性通常是可变的,请用复制语句的通用形式替换这两行:

<root>

应用于您的XML(我暗示了一个名为<root> <car name="Ferrari"> <color>red</color> <speed>300</speed> </car> <car name="Porsche"> <color>black</color> <speed>310</speed> </car> </root> 的文档元素),这将是结果

{{1}}

在Python中将XSLT应用于XML的示例代码应该很容易找到,所以我在这里省略。它只需要4到5行Python代码。

答案 1 :(得分:1)

我不知道python,但假设你有一个XML解析器,它给你一个XML文档中节点的层次访问,你想要的语义就像下面这样(警告,我倾向于使用PHP)。基本上,存储任何非“汽车”标签,然后当您遇到新的“汽车”标签时,将其视为分隔字段并创建组装的XML节点:

// Create an input and output handle
input_handle = parse_xml_document();
output_handle = new_xml_document();

// Assuming the <car>, <color> etc. nodes are
// the children of some, get them as an array
list_of_nodes = input_handle.get_list_child_nodes();

// These are empty variables for storing our data as we parse it
var car, color, speed = NULL

foreach(list_of_nodes as node)
{
  if(node.tag_name() == "speed")
  {
    speed = node.value();
    // etc for each type of non-delimiting field          
  }

  if(node.tag_name() == "car")
  {
    // If there's already a car specified, take its data,
    // insert it into the output xml structure and th
    if(car != NULL)
    {
      // Add a new child node to the output document
      node = output_handle.append_child_node("car");
      // Set the attribute on this new output node
      node.set_attribute("name", node.value());
      // Add the stored child attributes
      node.add_child("color", color);
      node.add_child("speed", speed);
    }

    // Replace the value of car afterwards. This allows the
    // first iteration to happen when there is no stored value
    // for "car".
    car = node.value();

  }
}

答案 2 :(得分:0)

IF 您的实际数据就像您的示例一样简单,并且没有错误,您可以使用正则表达式替换来实现一次:

import re

guff = """
<car>Ferrari</car>
<color>red</color>
<speed>300</speed>
<car>Porsche</car>
<color>black</color>
<speed>310</speed>
"""

pattern = r"""
<car>([^<]+)</car>\s*
<color>([^<]+)</color>\s*
<speed>([^<]+)</speed>\s*
"""

repl = r"""<car name="\1">
    <color>\2</color>
    <speed>\3</speed>
</car>
"""

regex = re.compile(pattern, re.VERBOSE)
output = regex.sub(repl, guff)
print output

否则你最好一次读3行,做一些验证,并一次写出一个“car”元素,使用字符串处理或ElementTree。

答案 3 :(得分:0)

假设根中的第一个元素是car元素,并且所有非car元素“属于”最后一个car

import xml.etree.cElementTree as etree

root = etree.XML('''<root>
<car>Ferrari</car>
<color>red</color>
<speed>300</speed>
<car>Porsche</car>
<color>black</color>
<speed>310</speed>
</root>''')

new_root = etree.Element('root')

for elem in root:
    if elem.tag == 'car':
        car = etree.SubElement(new_root, 'car', name=elem.text)
    else:
        car.append(elem)

new_root将是:

<root><car name="Ferrari"><color>red</color>
<speed>300</speed>
</car><car name="Porsche"><color>black</color>
<speed>310</speed>
</car></root>

(我认为漂亮的空白并不重要)