urllib用户代理标头不起作用?

时间:2016-02-22 09:09:53

标签: urllib python-3.5

我正在尝试使用Python' s(Python 3.5)urllib从网站请求html 我观看了一些关于如何在线废弃物品的视频,其中大部分都教会了我们如何使用标题来假装机器人是浏览器。

import urllib.request, urllib.parse


url = 'http://www.google.com/search?'
values = {
    'q':'hello',
    'oq':'hello',
    'num':'100'
    }

headers = {}
headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36'


url = url + urllib.parse.urlencode(values)

req = urllib.request.Request(url,headers = headers)
resp = urllib.request.urlopen(req)

respData = resp.read()

代码的结果总是让我得到一个如下所示的服务

Traceback (most recent call last):
  File "C:/Users/f550vc/Desktop/google count.py", line 18, in <module>
    resp = urllib.request.urlopen(req)
  File "C:\Python35\lib\urllib\request.py", line 162, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Python35\lib\urllib\request.py", line 471, in open
    response = meth(req, response)
  File "C:\Python35\lib\urllib\request.py", line 581, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python35\lib\urllib\request.py", line 503, in error
    result = self._call_chain(*args)
  File "C:\Python35\lib\urllib\request.py", line 443, in _call_chain
    result = func(*args)
  File "C:\Python35\lib\urllib\request.py", line 686, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "C:\Python35\lib\urllib\request.py", line 471, in open
    response = meth(req, response)
  File "C:\Python35\lib\urllib\request.py", line 581, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python35\lib\urllib\request.py", line 503, in error
    result = self._call_chain(*args)
  File "C:\Python35\lib\urllib\request.py", line 443, in _call_chain
    result = func(*args)
  File "C:\Python35\lib\urllib\request.py", line 686, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "C:\Python35\lib\urllib\request.py", line 471, in open
    response = meth(req, response)
  File "C:\Python35\lib\urllib\request.py", line 581, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python35\lib\urllib\request.py", line 503, in error
    result = self._call_chain(*args)
  File "C:\Python35\lib\urllib\request.py", line 443, in _call_chain
    result = func(*args)
  File "C:\Python35\lib\urllib\request.py", line 686, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "C:\Python35\lib\urllib\request.py", line 471, in open
    response = meth(req, response)
  File "C:\Python35\lib\urllib\request.py", line 581, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python35\lib\urllib\request.py", line 509, in error
    return self._call_chain(*args)
  File "C:\Python35\lib\urllib\request.py", line 443, in _call_chain
    result = func(*args)
  File "C:\Python35\lib\urllib\request.py", line 589, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 503: Service Unavailable

尝试使用mechanicalsoup和mechanize但我真的需要知道没有它们的方法,但出于某种原因使用urllib。

1 个答案:

答案 0 :(得分:1)

以下代码在Python 3.5上工作(尝试取消注释并使用另一个)

import urllib
from urllib.request import urlopen
url = 'https://www.google.com';
try:
    user_agent = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64)'
    head = { 'User-Agent' : user_agent }
    req = urllib.request.Request(url,headers = head)
    res = urllib.request.urlopen(req)
    print(res.read().decode('utf-8'))
except Exception as e:
    print(str(e))
'''
try:
    headers = {}
    headers['User-Agent'] = "Mozilla/5.0 (X11; Linux i686)   AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.27 Safari/537.17"
    req = urllib.request.Request(url, headers = headers)
    resp = urllib.request.urlopen(req)
    respData = resp.read()
    respData =respData.decode('utf-8')
    print(respData)

except Exception as e:
    print(str(e))
'''