Question

对于一个项目，我决定制作一个应用程序，帮助人们在Twitter上找朋友。

我已经能够从xml页面中获取用户名。例如，使用我当前的代码，我可以从XML页面获取<uri>http://twitter.com/username</uri>，但我想使用Beautiful Soup删除<uri>和</uri>代码。

这是我目前的代码：

import urllib
import BeautifulSoup

doc = urllib.urlopen("http://search.twitter.com/search.atom?q=travel").read()

soup = BeautifulStoneSoup(''.join(doc))
data = soup.findAll("uri")

Answer 1

不要使用BeautifulSoup来解析twitter，使用他们的API（也不要使用BeautifulSoup，使用lxml）。回答你的问题：

import urllib
from BeautifulSoup import BeautifulSoup

resp = urllib.urlopen("http://search.twitter.com/search.atom?q=travel")
soup = BeautifulSoup(resp.read())
for uri in soup.findAll('uri'):
    uri.extract()

Answer 2

要回答有关BeautifulSoup的问题，您需要text来获取每个<uri>标记的内容。在这里，我将信息提取到列表理解中：

>>> uris = [uri.text for uri in soup.findAll('uri')]
>>> len(uris)
15
>>> print uris[0]
http://twitter.com/MarieJeppesen

但是，as zeekay says，Twitter's REST API是查询Twitter的更好方法。

Python XMl Parser with BeautifulSoup。如何删除标签？

2 个答案: