rss文件如下所示,我想获取 media:group 部分中的内容。我查看了feedparser的文档,但似乎没有提到这一点。怎么做?任何帮助表示赞赏。
<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:ymusic="http://music.yahoo.com/rss/1.0/ymusic/" xmlns:media="http://search.yahoo.com/mrss/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:cf="http://www.microsoft.com/schemas/rss/core/2005" xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0"><channel>
<title>XYZ InfoX: Special hello </title>
<link>http://www1.XYZInfoX.com/learninghello/home</link>
<description>hello</description>
<language>en</language> <copyright />
<pubDate>Wed, 17 Mar 2010 08:50:06 GMT</pubDate>
<dc:creator />
<dc:date>2010-03-17T08:50:06Z</dc:date>
<dc:language>en</dc:language> <dc:rights />
<image>
<title>Voice of America</title>
<link>http://www1.XYZInfoX.com/learninghello</link>
<url>http://media.XYZInfoX.com/designimages/XYZRSSIcon.gif</url>
</image>
<item>
<title>Who Were the Deadliest Gunmen of the Wild West?</title>
<link>http://www1.XYZInfoX.com/learninghello/home/Deadliest-Gunmen-of-the-Wild-West-87826807.html</link>
<description> The story of two of them: "Killin'" Jim Miller was an outlaw, "Texas" John Slaughter was a lawman | EXPLORATIONS </description>
<pubDate>Wed, 17 Mar 2010 00:38:48 GMT</pubDate>
<guid isPermaLink="false">87826807</guid>
<dc:creator></dc:creator>
<dc:date>2010-03-17T00:38:48Z</dc:date>
<media:group>
<media:content url="http://media.XYZInfoX.com/images/archives_peace_comm_480_16mar_se.jpg" medium="image" isDefault="true" height="300" width="480" />
<media:content url="http://media.XYZInfoX.com/images/archives_peace_comm_230_16mar_se_edited-1.jpg" medium="image" isDefault="false" height="230" width="230" />
<media:content url="http://media.XYZInfoX.com/images/tex_trans_lawmans_230_16mar10_se.jpg" medium="image" isDefault="false" height="230" width="230" />
<media:content url="http://www.XYZInfoX.com/MediaAssets2/learninghello/dalet/se-exp-outlaws-part2-17mar2010.Mp3" type="audio/mpeg" medium="audio" isDefault="false" />
</media:group>
</item>
答案 0 :(得分:5)
feedparser 4.1有这个错误。
我的解决方案是从存储库中获取最新的feedparser.py(4.2 pre)。
svn checkout http://feedparser.googlecode.com/svn/trunk/ feedparser-readonly
cd feedparser-readonly
python setup.py install
现在您可以访问所有mrss项目
>>> import feedparser # the new version!
>>> d = feedparser.parse(MY_XML_URL)
>>> for content in d.entries[0].media_content: print content['url']
应该为你做的工作
答案 1 :(得分:0)
您可以使用
解析Feedfeed = feedparser.parse(your_feeds_url)
然后使用python的属性访问或feed
及其子元素上的字典访问来访问您的xml元素。前一种方法不适用于像media:content
这样的元素名称,因此请使用后一种方法。