迭代xml文档

时间:2016-05-10 15:32:43

标签: python xml dictionary

我有一个这种结构的文件:

<?xml version="1.0" encoding="UTF-8"?>
<entries>
  <entry>
    <term>word_1</term>
    <opinion source="data1" polarity="0.10" />
    <opinion source="data2" polarity="0.4" />
  </entry>
  <entry>
    <term>word_2</term>
    <opinion source="data1" polarity="1.0" />
    <opinion source="data2" polarity="-0.16666667" />
    <opinion source="data3" polarity="0.004" />
 </entry>
 <entry>
    <term>word_3</term>
    <opinion source="data1" polarity="0.6" />
    <opinion source="data2" polarity="0.0" />
 </entry>
</entries>

我之前从未与xml合作,这证明是一种痛苦。我想提取单词,它们的极性和来源。理想情况下,来自这个例子,我会有三个以source命名的词典(我知道确切地说有许多不同的来源,因此手动命名词典不是问题),这将保持单词{{1和极性为key

value

问题是,我是不是真的理解如何迭代这个结构。我可以这样迭代data1 = {'word1':0.10, 'word2':1.0, 'word3':0.6} data2 = {'word1':0.4, 'word2':-0.16666667, 'word3':0.0} data3 = {'word2':0.004}

<term>

但我无法访问import xml.etree.ElementTree as ET tree = ET.parse('my.xml') root = tree.getroot() for term in root.iter('term'): print term.text Out: word_1 word_2 word_3 source项。 任何帮助表示赞赏。感谢。

2 个答案:

答案 0 :(得分:2)

看看这个,我认为你应该能够了解它是如何运作的。

import xml.etree.ElementTree as ET

data = {}
tree = ET.parse('test.xml')
root = tree.getroot()

for entry in root.iter('entry'):
    term = entry.find('term')
    for opinion in entry.iter('opinion'):
        termDict = data.setdefault(opinion.get('source'), {})
        termDict[term.text] = opinion.get('polarity')

for k,v in data.items():
    print k, v

答案 1 :(得分:1)

你想要这样的东西

import xml.etree.ElementTree
e = xml.etree.ElementTree.parse('test.xml').getroot()
for node in e.iter('entry'): #iterate over each entry node
    for child in node:
            print child.tag #get the name of the child
            print child.attrib['polarity'], child.attrib['source'] #get the source and polarity

child.attrib会为您提供该特定节点属性的字典。