是否有更优雅的Pythonic方法从xml树中获取与嵌套循环和ifs迭代相同的子树中的某些元素?
即。在伪SQL中
>>> df[[c1]]
Subject
0 MATH
1 MATH
2 MATH
3 MATH
4 PSY
5 PSY
6 PSY
7 PSY
以下是来自Alexa Amazon AWIS的XML响应的良好形成的子集:
select UsageStatistic/PageViews/PerUser/Value from Tree where UsageStatistic/TimeRange/Days=7
这是我到目前为止的代码。它将上面的响应XML文件读入alexa_response.xml
<?xml version="1.0" encoding="utf-8"?>
<aws:UrlInfoResponse xmlns:aws="http://alexa.amazonaws.com/doc/2005-10-05/"><Response><OperationRequest><RequestId>dsfadf</RequestId></OperationRequest><UrlInfoResult><Alexa>
<TrafficData>
<DataUrl type="canonical">yahoo.com</DataUrl>
<UsageStatistics>
<UsageStatistic>
<TimeRange>
<Days>7</Days>
</TimeRange>
<Rank>
<Value>5</Value>
<Delta>0</Delta>
</Rank>
<Reach>
<Rank>
<Value>5</Value>
<Delta>0</Delta>
</Rank>
<PerMillion>
<Value>111,200</Value>
<Delta>-0.49%</Delta>
</PerMillion>
</Reach>
<PageViews>
<PerMillion>
<Value>11,442</Value>
<Delta>-1.71%</Delta>
</PerMillion>
<Rank>
<Value>7</Value>
<Delta>1</Delta>
</Rank>
<PerUser>
<Value>6.42</Value>
<Delta>-1.20%</Delta>
</PerUser>
</PageViews>
</UsageStatistic>
<UsageStatistic>
<TimeRange>
<Days>3</Days>
</TimeRange>
<Rank>
<Value>5</Value>
<Delta>0</Delta>
</Rank>
<Reach>
<Rank>
<Value>5</Value>
<Delta>0</Delta>
</Rank>
<PerMillion>
<Value>112,130</Value>
<Delta>-14.85%</Delta>
</PerMillion>
</Reach>
<PageViews>
<PerMillion>
<Value>11,314</Value>
<Delta>-13.39%</Delta>
</PerMillion>
<Rank>
<Value>6</Value>
<Delta>0</Delta>
</Rank>
<PerUser>
<Value>7.99</Value>
<Delta>+1.4%</Delta>
</PerUser>
</PageViews>
</UsageStatistic>
<UsageStatistic>
<TimeRange>
<Months>3</Months>
</TimeRange>
<Rank>
<Value>5</Value>
<Delta>0</Delta>
</Rank>
<Reach>
<Rank>
<Value>5</Value>
<Delta>0</Delta>
</Rank>
<PerMillion>
<Value>112,130</Value>
<Delta>-14.85%</Delta>
</PerMillion>
</Reach>
<PageViews>
<PerMillion>
<Value>11,314</Value>
<Delta>-13.39%</Delta>
</PerMillion>
<Rank>
<Value>6</Value>
<Delta>0</Delta>
</Rank>
<PerUser>
<Value>6.99</Value>
<Delta>+1.6%</Delta>
</PerUser>
</PageViews>
</UsageStatistic>
</UsageStatistics>
</TrafficData>
</Alexa></UrlInfoResult><aws:ResponseStatus><aws:StatusCode>Success</aws:StatusCode></aws:ResponseStatus></Response></aws:UrlInfoResponse>
结果:
import xml.etree.ElementTree as ET
prefix = "aws"
uri = "http://alexa.amazonaws.com/doc/2005-10-05/"
ET.register_namespace(prefix, uri)
tree = ET.parse('alexa_response.xml')
root = tree.getroot()
for a in root.iter("UsageStatistic"):
for b in a:
if b.tag == 'TimeRange':
for c in b:
print c.tag, c.text
if b.tag == 'PageViews':
for d in b:
if d.tag == 'PerUser':
for f in d:
if f.tag == 'Value':
print f.tag, f.text
print
我只需要
Days 7
Value 6.42
Months 3
Value 6.99
这是来自TimeRange / Days / 7所在的同一子树的PageViews / PerUser / Value / 6.42。
我想知道是否有更好的方法可以使用多个嵌套循环和ifs进行迭代?
答案 0 :(得分:1)
您可以使用单个XPath表达式执行此操作:
//UsageStatistic/PageViews/PerUser/Value[../../../TimeRange/Days=7]
答案 1 :(得分:0)
感谢您的评论和解答@max和@Parfait。我不得不修改一下以使其工作,所以必须将其作为我自己的答案发布。
prefix = "aws"
uri = "http://alexa.amazonaws.com/doc/2005-10-05/"
import lxml.etree as lET
lET.register_namespace(prefix, uri)
doc=lET.parse('alexa_response.xml')
doc_root=doc.getroot()
for value in doc_root.xpath('.//UsageStatistic[TimeRange/Days="7"]/PageViews/PerUser/Value'):
print value.text