无法使用feedparser从Feed中检索多个标记

时间:2011-07-02 21:05:52

标签: python xml feedparser atom-feed

我有以下XML文档

<?xml version='1.0' encoding='UTF-8'?><entry xmlns='http://www.w3.org/2005/Atom' xmlns:gd='http://schemas.google.com/g/2005' xmlns:issues='http://schemas.google.com/projecthosting/issues/2009' gd:etag='W/"DEAERH47eCl7ImA9WhZTFEQ."'><id>http://code.google.com/feeds/issues/p/chromium/issues/full/921</id><published>2008-09-03T22:51:22.000Z</published><updated>2011-03-19T01:05:05.000Z</updated><title>Incorrect rendering</title><content type='html'>Product Version      : 0.2.149.27
URLs (if applicable) : http://www.battlefield.ea.com/battlefield/bf/
<b>Other browsers tested:</b>
<b>Add OK or FAIL after other browsers where you have tested this issue:</b>
     Safari 3: N/A
    Firefox 3: OK
         IE 7: OK
   Opera 9.60: OK

<b>What steps will reproduce the problem?</b>
1. Open http://www.battlefield.ea.com/battlefield/bf/
2. Look at incorrect render
</content><link rel='replies' type='application/atom+xml' href='http://code.google.com/feeds/issues/p/chromium/issues/921/comments/full'/><link rel='alternate' type='text/html' href='http://code.google.com/p/chromium/issues/detail?id=921'/><link rel='self' type='application/atom+xml' href='https://code.google.com/feeds/issues/p/chromium/issues/full/921'/><author><name>Dragon31...@gmail.com</name><uri>/u/@UBBRQVRZAxFEXgB4GA%3D%3D/</uri></author><issues:closedDate>2009-05-14T20:08:31.000Z</issues:closedDate><issues:id>921</issues:id><issues:label>Type-Bug</issues:label><issues:label>Pri-2</issues:label><issues:label>OS-All</issues:label><issues:label>Area-Compat</issues:label><issues:label>Webkit-specific</issues:label><issues:label>Mstone-2.1</issues:label><issues:label>compat-bug-2.0</issues:label><issues:label>Report-to-webkit</issues:label><issues:label>bulkmove</issues:label><issues:label>Action-ReductionNeeded</issues:label><issues:stars>5</issues:stars><issues:state>closed</issues:state><issues:status>WontFix</issues:status></entry>

我使用feedparser解析此文档。我做了以下事情:

import feedparser
text = "" #Read from the above document
d = feedparser.parse(text) 
d.entries[0].issues_label

我发现我只得到一个标签:

d.entries[0].issues_label
u'Action-ReductionNeeded'

有多个问题标签:

<issues:label>Type-Bug</issues:label><issues:label>Pri-2</issues:label><issues:label>OS-All</issues:label><issues:label>Area-Compat</issues:label><issues:label>Webkit-specific</issues:label><issues:label>Mstone-2.1</issues:label><issues:label>compat-bug-2.0</issues:label><issues:label>Report-to-webkit</issues:label><issues:label>bulkmove</issues:label><issues:label>Action-ReductionNeeded</issues:label>

但我能够找回最后一个。我想检索所有这些。

1 个答案:

答案 0 :(得分:1)

您可以使用lxml来解析XML:

>>> import lxml.etree
>>> doc = lxml.etree.parse(xml)
>>> ns = {'issues':'http://schemas.google.com/projecthosting/issues/2009'}
>>> [x.text for x in doc.xpath('//issues:label', namespaces=ns)]
<<< 
['Type-Bug',
 'Pri-2',
 'OS-All',
 'Area-Compat',
 'Webkit-specific',
 'Mstone-2.1',
 'compat-bug-2.0',
 'Report-to-webkit',
 'bulkmove',
 'Action-ReductionNeeded']