当我尝试使用此代码抓取Twitter时:
import urllib2
s = "https://mobile.twitter.com/bing/"
html = urllib2.urlopen(s).read()
print html
...我收到以下错误:
Traceback (most recent call last):
File "C:\Users\arpit\Downloads\Desktop\Wiki Code\final Crawler_wiki.py", line 14, in <module>
html = urllib2.urlopen(s).read()
File "C:\Python27\lib\urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "C:\Python27\lib\urllib2.py", line 400, in open
response = self._open(req, data)
File "C:\Python27\lib\urllib2.py", line 418, in _open
'_open', req)
File "C:\Python27\lib\urllib2.py", line 378, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 1215, in https_open
return self.do_open(httplib.HTTPSConnection, req)
File "C:\Python27\lib\urllib2.py", line 1177, in do_open
raise URLError(err)
URLError: <urlopen error [Errno 10061] No connection could be made because the target machine actively refused it>
如果我将mobile.twitter.com
替换为twitter.com
,那么它可以正常运行,但我希望它可以与mobile.twitter.com
一起使用。
答案 0 :(得分:0)
Twitter网站可能正在寻找您通过urllib api发出请求时未设置的用户代理。
您可能需要使用类似mechanize的内容来伪造您的用户代理。
但我强烈建议您使用twitter api,它提供了很多简单而有趣的数据播放方式。