我有以下XML文档
<?xml version='1.0' encoding='UTF-8'?><entry xmlns='http://www.w3.org/2005/Atom' xmlns:gd='http://schemas.google.com/g/2005' xmlns:issues='http://schemas.google.com/projecthosting/issues/2009' gd:etag='W/"DEAERH47eCl7ImA9WhZTFEQ."'><id>http://code.google.com/feeds/issues/p/chromium/issues/full/921</id><published>2008-09-03T22:51:22.000Z</published><updated>2011-03-19T01:05:05.000Z</updated><title>Incorrect rendering</title><content type='html'>Product Version : 0.2.149.27
URLs (if applicable) : http://www.battlefield.ea.com/battlefield/bf/
<b>Other browsers tested:</b>
<b>Add OK or FAIL after other browsers where you have tested this issue:</b>
Safari 3: N/A
Firefox 3: OK
IE 7: OK
Opera 9.60: OK
<b>What steps will reproduce the problem?</b>
1. Open http://www.battlefield.ea.com/battlefield/bf/
2. Look at incorrect render
</content><link rel='replies' type='application/atom+xml' href='http://code.google.com/feeds/issues/p/chromium/issues/921/comments/full'/><link rel='alternate' type='text/html' href='http://code.google.com/p/chromium/issues/detail?id=921'/><link rel='self' type='application/atom+xml' href='https://code.google.com/feeds/issues/p/chromium/issues/full/921'/><author><name>Dragon31...@gmail.com</name><uri>/u/@UBBRQVRZAxFEXgB4GA%3D%3D/</uri></author><issues:closedDate>2009-05-14T20:08:31.000Z</issues:closedDate><issues:id>921</issues:id><issues:label>Type-Bug</issues:label><issues:label>Pri-2</issues:label><issues:label>OS-All</issues:label><issues:label>Area-Compat</issues:label><issues:label>Webkit-specific</issues:label><issues:label>Mstone-2.1</issues:label><issues:label>compat-bug-2.0</issues:label><issues:label>Report-to-webkit</issues:label><issues:label>bulkmove</issues:label><issues:label>Action-ReductionNeeded</issues:label><issues:stars>5</issues:stars><issues:state>closed</issues:state><issues:status>WontFix</issues:status></entry>
我使用feedparser解析此文档。我做了以下事情:
import feedparser
text = "" #Read from the above document
d = feedparser.parse(text)
d.entries[0].issues_label
我发现我只得到一个标签:
d.entries[0].issues_label
u'Action-ReductionNeeded'
有多个问题标签:
<issues:label>Type-Bug</issues:label><issues:label>Pri-2</issues:label><issues:label>OS-All</issues:label><issues:label>Area-Compat</issues:label><issues:label>Webkit-specific</issues:label><issues:label>Mstone-2.1</issues:label><issues:label>compat-bug-2.0</issues:label><issues:label>Report-to-webkit</issues:label><issues:label>bulkmove</issues:label><issues:label>Action-ReductionNeeded</issues:label>
但我能够找回最后一个。我想检索所有这些。
答案 0 :(得分:1)
您可以使用lxml来解析XML:
>>> import lxml.etree
>>> doc = lxml.etree.parse(xml)
>>> ns = {'issues':'http://schemas.google.com/projecthosting/issues/2009'}
>>> [x.text for x in doc.xpath('//issues:label', namespaces=ns)]
<<<
['Type-Bug',
'Pri-2',
'OS-All',
'Area-Compat',
'Webkit-specific',
'Mstone-2.1',
'compat-bug-2.0',
'Report-to-webkit',
'bulkmove',
'Action-ReductionNeeded']