如何使用Python抓取Twitter页面?

时间:2013-02-18 10:50:37

标签: python twitter python-2.7

当我尝试使用此代码抓取Twitter时:

import urllib2
s = "https://mobile.twitter.com/bing/"
html = urllib2.urlopen(s).read()
print html

...我收到以下错误:

Traceback (most recent call last):
  File "C:\Users\arpit\Downloads\Desktop\Wiki Code\final Crawler_wiki.py", line 14, in <module>
    html = urllib2.urlopen(s).read()
  File "C:\Python27\lib\urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "C:\Python27\lib\urllib2.py", line 400, in open
    response = self._open(req, data)
  File "C:\Python27\lib\urllib2.py", line 418, in _open
    '_open', req)
  File "C:\Python27\lib\urllib2.py", line 378, in _call_chain
    result = func(*args)
  File "C:\Python27\lib\urllib2.py", line 1215, in https_open
    return self.do_open(httplib.HTTPSConnection, req)
  File "C:\Python27\lib\urllib2.py", line 1177, in do_open
    raise URLError(err)
URLError: <urlopen error [Errno 10061] No connection could be made because the target machine actively refused it>

如果我将mobile.twitter.com替换为twitter.com,那么它可以正常运行,但我希望它可以与mobile.twitter.com一起使用。

1 个答案:

答案 0 :(得分:0)

Twitter网站可能正在寻找您通过urllib api发出请求时未设置的用户代理。

您可能需要使用类似mechanize的内容来伪造您的用户代理。

但我强烈建议您使用twitter api,它提供了很多简单而有趣的数据播放方式。