我正在尝试使用Python' s(Python 3.5)urllib从网站请求html 我观看了一些关于如何在线废弃物品的视频,其中大部分都教会了我们如何使用标题来假装机器人是浏览器。
import urllib.request, urllib.parse
url = 'http://www.google.com/search?'
values = {
'q':'hello',
'oq':'hello',
'num':'100'
}
headers = {}
headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36'
url = url + urllib.parse.urlencode(values)
req = urllib.request.Request(url,headers = headers)
resp = urllib.request.urlopen(req)
respData = resp.read()
代码的结果总是让我得到一个如下所示的服务
Traceback (most recent call last):
File "C:/Users/f550vc/Desktop/google count.py", line 18, in <module>
resp = urllib.request.urlopen(req)
File "C:\Python35\lib\urllib\request.py", line 162, in urlopen
return opener.open(url, data, timeout)
File "C:\Python35\lib\urllib\request.py", line 471, in open
response = meth(req, response)
File "C:\Python35\lib\urllib\request.py", line 581, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python35\lib\urllib\request.py", line 503, in error
result = self._call_chain(*args)
File "C:\Python35\lib\urllib\request.py", line 443, in _call_chain
result = func(*args)
File "C:\Python35\lib\urllib\request.py", line 686, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "C:\Python35\lib\urllib\request.py", line 471, in open
response = meth(req, response)
File "C:\Python35\lib\urllib\request.py", line 581, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python35\lib\urllib\request.py", line 503, in error
result = self._call_chain(*args)
File "C:\Python35\lib\urllib\request.py", line 443, in _call_chain
result = func(*args)
File "C:\Python35\lib\urllib\request.py", line 686, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "C:\Python35\lib\urllib\request.py", line 471, in open
response = meth(req, response)
File "C:\Python35\lib\urllib\request.py", line 581, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python35\lib\urllib\request.py", line 503, in error
result = self._call_chain(*args)
File "C:\Python35\lib\urllib\request.py", line 443, in _call_chain
result = func(*args)
File "C:\Python35\lib\urllib\request.py", line 686, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "C:\Python35\lib\urllib\request.py", line 471, in open
response = meth(req, response)
File "C:\Python35\lib\urllib\request.py", line 581, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python35\lib\urllib\request.py", line 509, in error
return self._call_chain(*args)
File "C:\Python35\lib\urllib\request.py", line 443, in _call_chain
result = func(*args)
File "C:\Python35\lib\urllib\request.py", line 589, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 503: Service Unavailable
尝试使用mechanicalsoup和mechanize但我真的需要知道没有它们的方法,但出于某种原因使用urllib。
答案 0 :(得分:1)
以下代码在Python 3.5上工作(尝试取消注释并使用另一个)
import urllib
from urllib.request import urlopen
url = 'https://www.google.com';
try:
user_agent = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64)'
head = { 'User-Agent' : user_agent }
req = urllib.request.Request(url,headers = head)
res = urllib.request.urlopen(req)
print(res.read().decode('utf-8'))
except Exception as e:
print(str(e))
'''
try:
headers = {}
headers['User-Agent'] = "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.27 Safari/537.17"
req = urllib.request.Request(url, headers = headers)
resp = urllib.request.urlopen(req)
respData = resp.read()
respData =respData.decode('utf-8')
print(respData)
except Exception as e:
print(str(e))
'''