我在django中使用lxml.etree.parse
来解析来自外部RSS Feed的一些内容,并使用findall
来解决命名空间。
我可以遍历搜索结果但是我无法显示结果中的任何文字。
以下是我试图从中获取的xml文件:
<feed xmlns="http://www.w3.org/2005/Atom">
<title>Open Library : Author Name</title>
<link href="http://www.somedomain.org/people/atom/author_name" rel="self"/>
<updated>2012-03-20T16:41:00Z</updated>
<author>
<name>somedomain.org</name>
</author>
<id>tag:somedomain.org,2007:/person_feed/123456</id>
<entry>
<link href="http://www.somedomain.org/roll_call/show/1234" rel="alternate"/>
<id>
tag:somedomain.org,2012-03-20:/roll_call_vote/1234
</id>
<updated>2012-03-20T16:41:00Z</updated>
<title>Once upon a time</title>
<content type="html">
This is a book full of words
</content>
</entry>
</feed>
以下是我在django中的观点:
def openauthors(request):
tree = lxml.etree.parse("http://www.somedomain.org/people/atom/author_name")
namespace = "{http://www.w3.org/2005/Atom}"
listings = tree.findall("{http://www.w3.org/2005/Atom}entry")
listings_info = []
for listing in listings:
this_value = {
"link": listing.findtext("content"),
"title": listing.findtext("feed/content"),
"content": listing.findtext("content"),
}
listings_info.append(this_value)
json_listings = '{"listings":' + simplejson.dumps(listings_info) + '}'
if("callback" in request.GET.keys()):
callback = request.GET["callback"]
else:
callback = None
if(callback):
response = HttpResponse("%s(%s)" % (
callback,
simplejson.dumps(listings_info)
), mimetype="application/json"
)
else:
response = HttpResponse(json_listings, mimetype="application/json")
return response
我还使用xpath
代替findtext
尝试了以下操作,但得到的结果相同。
"link":listing.xpath("link/text()"),
"title":listing.xpath("entry/link/text()"),
"content":listing.xpath("content/text()"),
感谢任何帮助。
答案 0 :(得分:1)
您不会考虑XML名称空间。
tree = lxml.etree.parse("http://www.somedomain.org/people/atom/author_name")
xmlns = {"atom": "http://www.w3.org/2005/Atom"}
listings = tree.xpath("//atom:entry", namespaces=xmlns)
listings_info = []
for listing in listings:
listings_info.append({
"link": listing.xpath("./atom:link/@href", namespaces=xmlns),
"title": listing.xpath("./atom:title", namespaces=xmlns),
"content": listing.xpath("./atom:content", namespaces=xmlns),
})
您必须定义前缀(即使XML中没有前缀)并在XPath表达式中使用它。这意味着您必须通知.xpath()
您将使用哪个前缀作为哪个命名空间,因此第二个参数。