我正在尝试解析xml文件并将其排列成一个将内容分隔为isElement,isAttribute,Value,Text的表。
如何使用ElementTree模块实现此目的?我知道使用minidom模块是可行的。
我想使用ElementTree的原因是因为有效。我想要解决的问题可以在这里找到:those from within psych::principal
有关如何使用ElementTree模块将xml内容分成元素,子元素等的任何建议吗?
这是我到目前为止所做的:
import xml.etree.cElementTree as ET
filetree = ET.ElementTree(file = "some_file.xml")
for child in filetree.iter():
print child.tag, child.text, child.attrib
对于以下示例xml文件:
<?xml version="1.0"?>
<data>
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
</country>
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country>
</data>
我得到这个作为输出:
data
{}
country
{'name': 'Liechtenstein'}
rank 1 {}
year 2008 {}
gdppc 141100 {}
neighbor None {'direction': 'E', 'name': 'Austria'}
neighbor None {'direction': 'W', 'name': 'Switzerland'}
country
{'name': 'Singapore'}
rank 4 {}
year 2011 {}
gdppc 59900 {}
neighbor None {'direction': 'N', 'name': 'Malaysia'}
country
{'name': 'Panama'}
rank 68 {}
year 2011 {}
gdppc 13600 {}
neighbor None {'direction': 'W', 'name': 'Costa Rica'}
neighbor None {'direction': 'E', 'name': 'Colombia'}
我确实在另一篇文章中发现了一些simialr,但它使用了DOM模块。 http://python.zirael.org/e-gtk-treeview4.html
根据收到的评论,这就是我想要实现的目标:
data (type Element)
country(Element)
Text = None
name(Attribute)
value: Liechtenstein
rank(Element)
Text = 1
year(Element)
Text = 2008
gdppc(Element)
Text = 141100
neighbour(Element)
name(Attribute)
value: Austria
direction(Attribute)
value: E
neighbour(Element)
name(Attribute)
value: Switzerland
direction(Attribute)
value: W
country(Element)
Text = None
name(Attribute)
value: Singapore
rank(Element)
Text = 4
我希望能够像上面的树一样在树中呈现我的数据。要做到这一点,我需要保持跟踪他们的关系。希望这澄清了这个问题。
答案 0 :(得分:1)
Element
个对象是包含其直接子元素的序列。 XML属性存储在将属性名称映射到值的字典中。 DOM中没有文本节点。文本存储为text
和tail
属性。元素中的文本但在第一个子元素存储在text
之前,该元素与下一个元素之间的文本存储在tail
中。因此,如果我们从TreeView IV. - display of trees获取 gtk-treeview4-2.py 示例,我们必须重写此DOM代码:
# ...
import xml.dom.minidom as dom
# ...
def create_interior(self):
# ...
doc = dom.parse(self.filename)
self.add_element_to_treestore(doc.childNodes[0], None)
# ...
def add_element_to_treestore(self, e, parent):
if isinstance(e, dom.Element):
me = self.model.append(parent, [e.nodeName, 'ELEMENT', ''])
for i in range(e.attributes.length):
a = e.attributes.item(i)
self.model.append(me, ['@' + a.name, 'ATTRIBUTE', a.value])
for ch in e.childNodes:
self.add_element_to_treestore(ch, me)
elif isinstance(e, dom.Text):
self.model.append(
parent, ['text()', 'TEXT_NODE', e.nodeValue.strip()])
以下使用ElementTree
:
# ...
from xml.etree import ElementTree as etree
# ...
def create_interior(self):
# ...
doc = etree.parse(self.filename)
self.add_element_to_treestore(doc.getroot())
# ...
def add_element_to_treestore(self, element, parent=None):
path = self.model.append(parent, [element.tag, 'ELEMENT', ''])
for name, value in sorted(element.attrib.iteritems()):
self.model.append(path, ['@' + name, 'ATTRIBUTE', value])
if element.text:
self.model.append(
path, ['text()', 'TEXT_NODE', element.text.strip()]
)
for child in element:
self.add_element_to_treestore(child, path)
if element.tail:
self.model.append(
path, ['text()', 'TEXT_NODE', element.tail.strip()]
)
屏幕截图显示示例数据,第一个子树完全展开:
更新:添加代码的示例数据和相关导入行的屏幕截图。
答案 1 :(得分:0)
可能不完全符合您的需求,但您可以使用XSLT转换XML以实现树状结构:
XSLT(包含标签和换行符)
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output version="1.0" encoding="UTF-8"/>
<xsl:template match="data">
<xsl:variable name="tabonce"><xsl:text> 	</xsl:text></xsl:variable>
<xsl:variable name="tabtwice"><xsl:text> 		</xsl:text></xsl:variable>
<data>
data (type Element)<xsl:text> 	</xsl:text>
<xsl:for-each select="country">
<xsl:value-of select="concat(local-name(.), '(Element)')"/>
Text = <xsl:value-of select="concat('None', $tabonce)"/>
<xsl:value-of select="concat(name(@*), '(Attribute)')"/>
value: <xsl:value-of select="concat(@*, $tabonce)"/>
<xsl:for-each select="*">
<xsl:value-of select="concat(local-name(.), '(Element)')"/>
Text = <xsl:value-of select="concat(., $tabonce)"/>
<xsl:if test="@*">
<xsl:text>	</xsl:text><xsl:value-of select="concat(name(@name), '(Attribute)')"/>
value: <xsl:value-of select="concat(@name, $tabtwice)"/>
<xsl:value-of select="concat(name(@direction), '(Attribute)')"/>
value: <xsl:value-of select="concat(@direction, $tabonce)"/>
</xsl:if>
</xsl:for-each>
<xsl:text> 	</xsl:text>
</xsl:for-each>
<xsl:text> </xsl:text>
</data>
</xsl:template>
</xsl:stylesheet>
使用lxml模块的Python脚本:
import lxml.etree as ET
dom = ET.parse('C:\Path\To\XMLfile.xml')
xslt = ET.parse('C:\Path\To\XSLfile.xsl')
transform = ET.XSLT(xslt)
newdom = transform(dom)
tree_out = ET.tostring(newdom, encoding='UTF-8', pretty_print=True, xml_declaration=True)
print(tree_out)
xmlfile = open('C:\Path\To\OutputPath.xml','wb')
xmlfile.write(tree_out)
xmlfile.close()
XML输出
<?xml version='1.0' encoding='UTF-8'?>
<data>
data (type Element)
country(Element)
Text = None
name(Attribute)
value: Liechtenstein
rank(Element)
Text = 1
year(Element)
Text = 2008
gdppc(Element)
Text = 141100
neighbor(Element)
Text =
name(Attribute)
value: Austria
direction(Attribute)
value: E
neighbor(Element)
Text =
name(Attribute)
value: Switzerland
direction(Attribute)
value: W
country(Element)
Text = None
name(Attribute)
value: Singapore
rank(Element)
Text = 4
year(Element)
Text = 2011
gdppc(Element)
Text = 59900
neighbor(Element)
Text =
name(Attribute)
value: Malaysia
direction(Attribute)
value: N
country(Element)
Text = None
name(Attribute)
value: Panama
rank(Element)
Text = 68
year(Element)
Text = 2011
gdppc(Element)
Text = 13600
neighbor(Element)
Text =
name(Attribute)
value: Costa Rica
direction(Attribute)
value: W
neighbor(Element)
Text =
name(Attribute)
value: Colombia
direction(Attribute)
value: E
</data>