ElementTree模块用于分隔xml内容

时间:2015-09-02 10:36:57

标签: python xml elementtree celementtree

我正在尝试解析xml文件并将其排列成一个将内容分隔为isElement,isAttribute,Value,Text的表。

如何使用ElementTree模块实现此目的?我知道使用minidom模块是可行的。

我想使用ElementTree的原因是因为有效。我想要解决的问题可以在这里找到:those from within psych::principal

有关如何使用ElementTree模块将xml内容分成元素,子元素等的任何建议吗?

这是我到目前为止所做的:

import xml.etree.cElementTree as ET

filetree = ET.ElementTree(file = "some_file.xml")
for child in filetree.iter():
     print child.tag, child.text, child.attrib

对于以下示例xml文件:

    <?xml version="1.0"?>
    <data>
        <country name="Liechtenstein">
            <rank>1</rank>
            <year>2008</year>
            <gdppc>141100</gdppc>
            <neighbor name="Austria" direction="E"/>
            <neighbor name="Switzerland" direction="W"/>
        </country>
        <country name="Singapore">
            <rank>4</rank>
            <year>2011</year>
            <gdppc>59900</gdppc>
            <neighbor name="Malaysia" direction="N"/>
        </country>
        <country name="Panama">
            <rank>68</rank>
            <year>2011</year>
            <gdppc>13600</gdppc>
            <neighbor name="Costa Rica" direction="W"/>
            <neighbor name="Colombia" direction="E"/>
        </country>
    </data>

我得到这个作为输出:

    data 
         {}
    country 
             {'name': 'Liechtenstein'}
    rank 1 {}
    year 2008 {}
    gdppc 141100 {}
    neighbor None {'direction': 'E', 'name': 'Austria'}
    neighbor None {'direction': 'W', 'name': 'Switzerland'}
    country 
             {'name': 'Singapore'}
    rank 4 {}
    year 2011 {}
    gdppc 59900 {}
    neighbor None {'direction': 'N', 'name': 'Malaysia'}
    country 
             {'name': 'Panama'}
    rank 68 {}
    year 2011 {}
    gdppc 13600 {}
    neighbor None {'direction': 'W', 'name': 'Costa Rica'}
    neighbor None {'direction': 'E', 'name': 'Colombia'}

我确实在另一篇文章中发现了一些simialr,但它使用了DOM模块。 http://python.zirael.org/e-gtk-treeview4.html

根据收到的评论,这就是我想要实现的目标:

    data (type Element)
         country(Element)
              Text = None
              name(Attribute)
                 value: Liechtenstein
              rank(Element)
                  Text = 1
              year(Element)
                  Text = 2008
              gdppc(Element)
                  Text = 141100
              neighbour(Element)
                  name(Attribute)
                      value: Austria
                  direction(Attribute)
                      value: E
              neighbour(Element)
                  name(Attribute)
                      value: Switzerland
                  direction(Attribute)
                      value: W

         country(Element)
              Text = None
              name(Attribute)
                 value: Singapore
              rank(Element)
                  Text = 4

我希望能够像上面的树一样在树中呈现我的数据。要做到这一点,我需要保持跟踪他们的关系。希望这澄清了这个问题。

2 个答案:

答案 0 :(得分:1)

Element个对象是包含其直接子元素的序列。 XML属性存储在将属性名称映射到值的字典中。 DOM中没有文本节点。文本存储为texttail属性。元素中的文本但在第一个子元素存储在text之前,该元素与下一个元素之间的文本存储在tail中。因此,如果我们从TreeView IV. - display of trees获取 gtk-treeview4-2.py 示例,我们必须重写此DOM代码:

# ...
import xml.dom.minidom as dom
# ...

    def create_interior(self):
        # ...
        doc = dom.parse(self.filename)
        self.add_element_to_treestore(doc.childNodes[0], None)
        # ...

    def add_element_to_treestore(self, e, parent):
        if isinstance(e, dom.Element):
            me = self.model.append(parent, [e.nodeName, 'ELEMENT', ''])
            for i in range(e.attributes.length):
                a = e.attributes.item(i)
                self.model.append(me, ['@' + a.name, 'ATTRIBUTE', a.value])
            for ch in e.childNodes:
                self.add_element_to_treestore(ch, me)
        elif isinstance(e, dom.Text):
            self.model.append(
                parent, ['text()', 'TEXT_NODE', e.nodeValue.strip()])

以下使用ElementTree

# ...
from xml.etree import ElementTree as etree
# ...

    def create_interior(self):
        # ...
        doc = etree.parse(self.filename)
        self.add_element_to_treestore(doc.getroot())
        # ...

    def add_element_to_treestore(self, element, parent=None):
        path = self.model.append(parent, [element.tag, 'ELEMENT', ''])
        for name, value in sorted(element.attrib.iteritems()):
            self.model.append(path, ['@' + name, 'ATTRIBUTE', value])
        if element.text:
            self.model.append(
                path, ['text()', 'TEXT_NODE', element.text.strip()]
            )
        for child in element:
            self.add_element_to_treestore(child, path)
            if element.tail:
                self.model.append(
                    path, ['text()', 'TEXT_NODE', element.tail.strip()]
                )

屏幕截图显示示例数据,第一个子树完全展开:

Screenshot of exampla data

更新:添加代码的示例数据和相关导入行的屏幕截图。

答案 1 :(得分:0)

可能不完全符合您的需求,但您可以使用XSLT转换XML以实现树状结构:

XSLT(包含标签和换行符)

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output version="1.0" encoding="UTF-8"/>

<xsl:template match="data">

<xsl:variable name="tabonce"><xsl:text>&#10;&#x9;</xsl:text></xsl:variable>
<xsl:variable name="tabtwice"><xsl:text>&#10;&#x9;&#x9;</xsl:text></xsl:variable>

<data>
    data (type Element)<xsl:text>&#10;&#x9;</xsl:text>
    <xsl:for-each select="country">
           <xsl:value-of select="concat(local-name(.), '(Element)')"/>
           Text = <xsl:value-of select="concat('None', $tabonce)"/> 
           <xsl:value-of select="concat(name(@*), '(Attribute)')"/>
              value: <xsl:value-of select="concat(@*, $tabonce)"/>          

        <xsl:for-each select="*">
        <xsl:value-of select="concat(local-name(.), '(Element)')"/>     
              Text = <xsl:value-of select="concat(., $tabonce)"/> 

              <xsl:if test="@*">
                 <xsl:text>&#x9;</xsl:text><xsl:value-of select="concat(name(@name), '(Attribute)')"/>
                 value: <xsl:value-of select="concat(@name, $tabtwice)"/>  
                 <xsl:value-of select="concat(name(@direction), '(Attribute)')"/>
                 value: <xsl:value-of select="concat(@direction, $tabonce)"/> 
              </xsl:if>

        </xsl:for-each>
        <xsl:text>&#10;&#x9;</xsl:text>

    </xsl:for-each>
    <xsl:text>&#10;</xsl:text>
</data>    

</xsl:template>
</xsl:stylesheet>

使用lxml模块的Python脚本:

import lxml.etree as ET

dom = ET.parse('C:\Path\To\XMLfile.xml')
xslt = ET.parse('C:\Path\To\XSLfile.xsl')
transform = ET.XSLT(xslt)
newdom = transform(dom)

tree_out = ET.tostring(newdom, encoding='UTF-8', pretty_print=True,  xml_declaration=True)
print(tree_out)

xmlfile = open('C:\Path\To\OutputPath.xml','wb')
xmlfile.write(tree_out)
xmlfile.close()

XML输出

<?xml version='1.0' encoding='UTF-8'?>
<data>
    data (type Element)
    country(Element)
        Text = None
    name(Attribute)
        value: Liechtenstein
    rank(Element)       
        Text = 1
    year(Element)       
        Text = 2008
    gdppc(Element)      
        Text = 141100
    neighbor(Element)       
        Text = 
        name(Attribute)
            value: Austria
        direction(Attribute)
            value: E
    neighbor(Element)       
        Text = 
        name(Attribute)
            value: Switzerland
        direction(Attribute)
            value: W

    country(Element)
        Text = None
    name(Attribute)
        value: Singapore
    rank(Element)       
        Text = 4
    year(Element)       
        Text = 2011
    gdppc(Element)      
        Text = 59900
    neighbor(Element)       
        Text = 
        name(Attribute)
            value: Malaysia
        direction(Attribute)
            value: N

    country(Element)
        Text = None
    name(Attribute)
        value: Panama
    rank(Element)       
        Text = 68
    year(Element)       
        Text = 2011
    gdppc(Element)      
        Text = 13600
    neighbor(Element)       
        Text = 
        name(Attribute)
            value: Costa Rica
        direction(Attribute)
            value: W
    neighbor(Element)       
        Text = 
        name(Attribute)
            value: Colombia
        direction(Attribute)
            value: E


</data>