大家好!我想通过python脚本访问一些网页。网址为:http://www.idealo.de/preisvergleich/Shop/27039.html
当我通过网络浏览器访问它时,没关系。但是当我想用urllib2访问它时:
a = urllib2.urlopen("http://www.idealo.de/preisvergleich/Shop/27039.html")
它给了我以下错误:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 406, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 519, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 444, in error
return self._call_chain(*args)
File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 527, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden
我还试图用wget:
访问它wget http://www.idealo.de/preisvergleich/Shop/27039.html
错误是:
--2012-04-23 12:42:03-- http://www.idealo.de/preisvergleich/Shop/27039.html
Resolving www.idealo.de (www.idealo.de)... 62.146.49.133
Connecting to www.idealo.de (www.idealo.de)|62.146.49.133|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2012-04-23 12:42:03 ERROR 403: Forbidden.
任何人都可以解释为什么会这样吗?我怎样才能使用python访问它?
答案 0 :(得分:5)
他们阻止了一些用户代理。如果您尝试使用以下内容:
wget -U "Mozilla/5.0" http://www.idealo.de/preisvergleich/Shop/27039.html
它有效。因此,您必须找到在python代码中伪造用户代理的方法,以使其正常工作。
试试这个:
import urllib2
opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
a = opener.open("http://www.idealo.de/preisvergleich/Shop/27039.html")