如何使用xpath / xquery从rss feed获取图片网址

时间:2012-11-30 01:33:41

标签: xml xpath rss xquery

以下是rss Feed的示例:

<item>
<title>Stem cells being made from blood</title>
<description>A patient's own blood has been used to make personalised stem cells, which doctors hope can be used to treat a range of diseases.</description>
<link>http://www.bbc.co.uk/news/health-20539835#sa-ns_mchannel=rss&ns_source=PublicRSS20-sa</link>
<guid isPermaLink="false">http://www.bbc.co.uk/news/health-20539835</guid>
<pubDate>Fri, 30 Nov 2012 00:45:53 GMT</pubDate>
<media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" width="66" height="49" url="http://news.bbcimg.co.uk/media/images/64469000/jpg/_64469170_c0147104-red_blood_cells%2C_artwork-spl.jpg"/>
<media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" width="144" height="81" url="http://news.bbcimg.co.uk/media/images/64469000/jpg/_64469171_c0147104-red_blood_cells%2C_artwork-spl.jpg"/>

我需要在媒体中获取.jpg图片的网址:缩略图.. 我怎么能用xpath或xquery做到这一点? 有人可以帮我吗? 谢谢!

1 个答案:

答案 0 :(得分:5)

假设你的XML是:

<item>
  <title>Stem cells being made from blood</title>
  <description>A patient's own blood has been used to make personalised stem cells, which doctors hope can be used to treat a range of diseases.</description>
  <link>http://www.bbc.co.uk/news/health-20539835#sa-ns_mchannel=rss&amp;ns_source=PublicRSS20-sa</link>
  <guid isPermaLink="false">http://www.bbc.co.uk/news/health-20539835</guid>
  <pubDate>Fri, 30 Nov 2012 00:45:53 GMT</pubDate>
  <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" width="66" height="49" url="http://news.bbcimg.co.uk/media/images/64469000/jpg/_64469170_c0147104-red_blood_cells%2C_artwork-spl.jpg"/>
  <media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" width="144" height="81" url="http://news.bbcimg.co.uk/media/images/64469000/jpg/_64469171_c0147104-red_blood_cells%2C_artwork-spl.jpg"/>
</item>

XPath //item/media:thumbnail/@url选择您想要的节点。在这种情况下,您需要在XPath引擎中传递名称空间和前缀 或者,您可以使用此XPath:

//item/*[local-name() = 'thumbnail' 
    and namespace-uri() = 'http://search.yahoo.com/mrss/']/@url