Pythonic方法在与另一个元素相同的子树中获取XML树元素

时间:2016-03-15 23:26:07

标签: python xml

是否有更优雅的Pythonic方法从xml树中获取与嵌套循环和ifs迭代相同的子树中的某些元素?

即。在伪SQL中

>>> df[[c1]]
  Subject
0    MATH
1    MATH
2    MATH
3    MATH
4     PSY
5     PSY
6     PSY
7     PSY

以下是来自Alexa Amazon AWIS的XML响应的良好形成的子集:

select UsageStatistic/PageViews/PerUser/Value from Tree where UsageStatistic/TimeRange/Days=7 

这是我到目前为止的代码。它将上面的响应XML文件读入alexa_response.xml

<?xml version="1.0" encoding="utf-8"?>
<aws:UrlInfoResponse xmlns:aws="http://alexa.amazonaws.com/doc/2005-10-05/"><Response><OperationRequest><RequestId>dsfadf</RequestId></OperationRequest><UrlInfoResult><Alexa>
<TrafficData>
<DataUrl type="canonical">yahoo.com</DataUrl>
<UsageStatistics>

<UsageStatistic>
<TimeRange>
<Days>7</Days>
</TimeRange>
<Rank>
<Value>5</Value>
<Delta>0</Delta>
</Rank>
<Reach>
<Rank>
<Value>5</Value>
<Delta>0</Delta>
</Rank>
<PerMillion>
<Value>111,200</Value>
<Delta>-0.49%</Delta>
</PerMillion>
</Reach>
<PageViews>
<PerMillion>
<Value>11,442</Value>
<Delta>-1.71%</Delta>
</PerMillion>
<Rank>
<Value>7</Value>
<Delta>1</Delta>
</Rank>
<PerUser>
<Value>6.42</Value>
<Delta>-1.20%</Delta>
</PerUser>
</PageViews>
</UsageStatistic>

<UsageStatistic>
<TimeRange>
<Days>3</Days>
</TimeRange>
<Rank>
<Value>5</Value>
<Delta>0</Delta>
</Rank>
<Reach>
<Rank>
<Value>5</Value>
<Delta>0</Delta>
</Rank>
<PerMillion>
<Value>112,130</Value>
<Delta>-14.85%</Delta>
</PerMillion>
</Reach>
<PageViews>
<PerMillion>
<Value>11,314</Value>
<Delta>-13.39%</Delta>
</PerMillion>
<Rank>
<Value>6</Value>
<Delta>0</Delta>
</Rank>
<PerUser>
<Value>7.99</Value>
<Delta>+1.4%</Delta>
</PerUser>
</PageViews>
</UsageStatistic>

<UsageStatistic>
<TimeRange>
<Months>3</Months>
</TimeRange>
<Rank>
<Value>5</Value>
<Delta>0</Delta>
</Rank>
<Reach>
<Rank>
<Value>5</Value>
<Delta>0</Delta>
</Rank>
<PerMillion>
<Value>112,130</Value>
<Delta>-14.85%</Delta>
</PerMillion>
</Reach>
<PageViews>
<PerMillion>
<Value>11,314</Value>
<Delta>-13.39%</Delta>
</PerMillion>
<Rank>
<Value>6</Value>
<Delta>0</Delta>
</Rank>
<PerUser>
<Value>6.99</Value>
<Delta>+1.6%</Delta>
</PerUser>
</PageViews>
</UsageStatistic>

</UsageStatistics>
</TrafficData>
</Alexa></UrlInfoResult><aws:ResponseStatus><aws:StatusCode>Success</aws:StatusCode></aws:ResponseStatus></Response></aws:UrlInfoResponse>

结果:

import xml.etree.ElementTree as ET
prefix = "aws"
uri = "http://alexa.amazonaws.com/doc/2005-10-05/"
ET.register_namespace(prefix, uri)
tree = ET.parse('alexa_response.xml')
root = tree.getroot()
for a in root.iter("UsageStatistic"):
    for b in a:
        if b.tag == 'TimeRange':
            for c in b: 
                print c.tag, c.text
        if b.tag == 'PageViews':
            for d in b: 
                if d.tag == 'PerUser':
                    for f in d:
                        if f.tag == 'Value':
                            print f.tag, f.text
    print

我只需要

Days 7
Value 6.42

Months 3
Value 6.99

这是来自TimeRange / Days / 7所在的同一子树的PageViews / PerUser / Value / 6.42。

我想知道是否有更好的方法可以使用多个嵌套循环和ifs进行迭代?

2 个答案:

答案 0 :(得分:1)

您可以使用单个XPath表达式执行此操作:

//UsageStatistic/PageViews/PerUser/Value[../../../TimeRange/Days=7]

答案 1 :(得分:0)

感谢您的评论和解答@max和@Parfait。我不得不修改一下以使其工作,所以必须将其作为我自己的答案发布。

prefix = "aws"
uri = "http://alexa.amazonaws.com/doc/2005-10-05/"
import lxml.etree as lET
lET.register_namespace(prefix, uri)
doc=lET.parse('alexa_response.xml')
doc_root=doc.getroot()
for value in doc_root.xpath('.//UsageStatistic[TimeRange/Days="7"]/PageViews/PerUser/Value'):
    print value.text