如何使用python从xml树中提取值?

时间:2015-02-12 11:47:04

标签: python xml api urllib alexa

我有一个api查询返回下面的xml树,我想从中提取某些值。特别是,我想提取诸如LinkedInCount之类的信息。

<aws:UrlInfoResponse xmlns:aws="http://alexa.amazonaws.com/doc/2005-10-05/">
<aws:Response xmlns:aws="http://awis.amazonaws.com/doc/2005-07-11">
<aws:OperationRequest>
<aws:RequestId>5486794a-0d03-4d47-a45b-e95764c3f0ee</aws:RequestId><
/aws:OperationRequest>
<aws:UrlInfoResult>
<aws:Alexa>

  <aws:ContentData>
    <aws:DataUrl type="canonical">yahoo.com/</aws:DataUrl>
    <aws:Asin>B00006D2TC</aws:Asin>
    <aws:SiteData>
      <aws:Title>Yahoo!</aws:Title>
      <aws:Description>Personalized content and search options. Chatrooms, free e-mail, clubs, and pager.</aws:Description>
      <aws:OnlineSince>18-Jan-1995</aws:OnlineSince>
    </aws:SiteData>
    <aws:Speed>
      <aws:MedianLoadTime>2242</aws:MedianLoadTime>
      <aws:Percentile>51</aws:Percentile>
    </aws:Speed>
    <aws:AdultContent>no</aws:AdultContent>
    <aws:Language>
      <aws:Locale>en</aws:Locale>
    </aws:Language>
    <aws:LinksInCount>76894</aws:LinksInCount>
    <aws:OwnedDomains>
      <aws:OwnedDomain>
        <aws:Domain>yahooligans.com</aws:Domain>
        <aws:Title>yahooligans.com</aws:Title>
      </aws:OwnedDomain>
    </aws:OwnedDomains>
  </aws:ContentData>

  <aws:Related>
    <aws:DataUrl type="canonical">yahoo.com/</aws:DataUrl>
    <aws:Asin>B00006D2TC</aws:Asin>
    <aws:RelatedLinks>
      <aws:RelatedLink>
        <aws:DataUrl type="canonical">aol.com/</aws:DataUrl>
        <aws:NavigableUrl>http://aol.com/</aws:NavigableUrl>
        <aws:Asin>B00006ARD3</aws:Asin>
        <aws:Relevance>301</aws:Relevance>
      </aws:RelatedLink>
    </aws:RelatedLinks>
    <aws:Categories>
      <aws:CategoryData>
        <aws:Title>On the Web/Web Portals</aws:Title>
        <aws:AbsolutePath>Top/Computers/Internet/On_the_Web/Web_Portals</aws:AbsolutePath>
      </aws:CategoryData>
    </aws:Categories>
  </aws:Related>        

  <aws:TrafficData>
    <aws:DataUrl type="canonical">yahoo.com/</aws:DataUrl>
    <aws:Asin>B00006D2TC</aws:Asin>
    <aws:Rank>1</aws:Rank>
    <aws:UsageStatistics>

      <aws:UsageStatistic>
        <aws:TimeRange>
          <aws:Days>1</aws:Days>
        </aws:TimeRange>
        <aws:Rank>
          <aws:Value>1</aws:Value>
          <aws:Delta>+0</aws:Delta>
        </aws:Rank>
        <aws:Reach>
          <aws:Rank>
            <aws:Value>2</aws:Value>
            <aws:Delta>+0</aws:Delta>
          </aws:Rank>
          <aws:PerMillion>
            <aws:Value>252,500</aws:Value>
            <aws:Delta>-1%</aws:Delta>
          </aws:PerMillion>
        </aws:Reach>
        <aws:PageViews>
          <aws:PerMillion>
            <aws:Value>51,400</aws:Value>
            <aws:Delta>-1%</aws:Delta>
          </aws:PerMillion>
          <aws:Rank>
            <aws:Value>1</aws:Value>
            <aws:Delta>+0</aws:Delta>
          </aws:Rank>
          <aws:PerUser>
            <aws:Value>13.7</aws:Value>
            <aws:Delta>-1%</aws:Delta>
          </aws:PerUser>
        </aws:PageViews>
      </aws:UsageStatistic>

    </aws:UsageStatistics>
  </aws:TrafficData>

</aws:Alexa>
</aws:UrlInfoResult>
<aws:ResponseStatus xmlns:aws="http://alexa.amazonaws.com/doc/2005-10-05/">
<aws:StatusCode>Success</aws:StatusCode>
</aws:ResponseStatus>
</aws:Response>
</aws:UrlInfoResponse> 

获得“树”之后,我可以使用以下代码获得响应:

elem = tree.find("//{http://alexa.amazonaws.com/doc/2005-10-05/}StatusCode")
print elem.text

但是,我不确定如何获取包含的LinksInCount

 <aws:LinksInCount>76894</aws:LinksInCount>

我尝试了以下内容:

elem = tree.find("//{http://alexa.amazonaws.com/doc/2005-10-05/}LinksInCount")
print elem.text


elem = tree.find("LinksInCount")
print elem.text

http://docs.aws.amazon.com/AlexaWebInfoService/latest/

1 个答案:

答案 0 :(得分:0)

看起来你正在使用ElementTree; find方法仅搜索当前元素的直接子元素。请尝试使用iterfind