使用elementtree,有一种简单的方法来解析整个xml文档,除了那些具有某些特定属性值的节点上的文本。作为一个例子,我想解析文件,除了属性name="Liechtenstein"
和属性month="08"
<data>
<country name="Liechtenstein">
<rank updated="yes">2</rank>
<language>english</language>
<currency>1.21$/kg</currency>
<gdppc month="06">141100</gdppc>
<gdpnp month="10">2.304e+0150</gdpnp>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore">
<rank updated="yes">5</rank>
<language>english</language>
<currency>4.1$/kg</currency>
<gdppc month="05">59900</gdppc>
<gdpnp month="08">5.2e-015</gdpnp>
<neighbor name="Malaysia" direction="N"/>
</country>
<country name="Lahore">
<rank updated="yes">8</rank>
<language>Pertr</language>
<currency>7.3$/kg</currency>
<gdppc month="010">34000</gdppc>
<gdpnp month="099">3.4e+015</gdpnp>
<neighbor name="Peru" direction="N"/>
</country>
</data>
基于以上所述,我想返回以下5
,english
,4.1$/kg
,59900
,8
,Pertr
,7.3$/kg
,34000
,3.4e+015
。我觉得可以使用iterparse
,但我不知道如何解决这个问题。
感谢您的建议
答案 0 :(得分:1)
xml.etree.ElementTree
模块解析XML内容。getiterator()
方法仅迭代country
代码。country
代码。country
方法对所选getchildren()
代码中的儿童进行迭代。代码:
import xml.etree.ElementTree as PARSER
root = PARSER.fromstring(data)
result = []
for i in root.getiterator("country"):
if "name" in i.attrib and i.attrib["name"] not in ["Liechtenstein"]:
tmp = []
for j in i.getchildren():
if "month" in j.attrib:
if j.attrib["month"] not in ["08"]:
if j.text:
tmp.append(j.text)
else:
if j.text:
tmp.append(j.text)
result.append(tmp)
print "result:-", result
输出:
:~/workspace/vivek$ python test3.py
result:- [['5', 'english', '4.1$/kg', '59900'], ['8', 'Pertr', '7.3$/kg', '34000', '3.4e+015']]
按lxml
模块
使用xpath()
方法获取目标country
代码。
代码:
import lxml.html as PARSER
root = PARSER.fromstring(data)
result = []
print "debug 1 list of country: ", root.xpath("//country[@name!='Liechtenstein']")
for i in root.xpath("//country[@name!='Liechtenstein']"):
tmp = []
for j in i.getchildren():
if "month" in j.attrib:
if j.attrib["month"] not in ["08"]:
if j.text:
tmp.append(j.text)
else:
if j.text:
tmp.append(j.text)
result.append(tmp)
print "result:-", result
结果:
:~/workspace/vtestproject$ python test3.py
debug 1 list of country: [<Element country at 0xb724da04>, <Element country at 0xb7257cac>]
result:- [['5', 'english', '4.1$/kg', '59900'], ['8', 'Pertr', '7.3$/kg', '34000', '3.4e+015']]