我在使用Python urllib
时遇到了问题。
以下是我尝试的代码:
import urllib
s = urllib.urlopen("https://www.mci.ir/web/guest/login")
这是我看到的错误:
Traceback (most recent call last):
File "<pyshell#3>", line 1, in <module>
s = urllib.urlopen("https://www.mci.ir/web/guest/login")
File "C:\Python27\lib\urllib.py", line 86, in urlopen
return opener.open(url)
File "C:\Python27\lib\urllib.py", line 207, in open
return getattr(self, name)(url)
File "C:\Python27\lib\urllib.py", line 450, in open_https
return self.http_error(url, fp, errcode, errmsg, headers)
File "C:\Python27\lib\urllib.py", line 371, in http_error
result = method(url, fp, errcode, errmsg, headers)
File "C:\Python27\lib\urllib.py", line 634, in http_error_302
data)
File "C:\Python27\lib\urllib.py", line 660, in redirect_internal
return self.open(newurl)
File "C:\Python27\lib\urllib.py", line 207, in open
return getattr(self, name)(url)
File "C:\Python27\lib\urllib.py", line 436, in open_https
h.endheaders(data)
File "C:\Python27\lib\httplib.py", line 954, in endheaders
self._send_output(message_body)
File "C:\Python27\lib\httplib.py", line 814, in _send_output
self.send(msg)
File "C:\Python27\lib\httplib.py", line 776, in send
self.connect()
File "C:\Python27\lib\httplib.py", line 1161, in connect
self.sock = ssl.wrap_socket(sock, self.key_file, self.cert_file)
File "C:\Python27\lib\ssl.py", line 381, in wrap_socket
ciphers=ciphers)
File "C:\Python27\lib\ssl.py", line 143, in __init__
self.do_handshake()
File "C:\Python27\lib\ssl.py", line 305, in do_handshake
self._sslobj.do_handshake()
IOError: [Errno socket error] [Errno 8] _ssl.c:504: EOF occurred in violation of protocol
答案 0 :(得分:2)
远程服务器似乎不喜欢User-Agent
和urllib.urlopen()
(Python 2)使用urllib2.urlopen()
标头,也不喜欢urllib.request.urlopen()
(Python 3)。它正在关闭连接。
使用requests
包发出请求确实有效:
>>> import requests
>>> r = requests.get('https://www.mci.ir/web/guest/login')
>>> r
<Response [200]>
将User-Agent设置为urllib/urllib2
:
>>> r = requests.get('https://www.mci.ir/web/guest/login', headers={'User-Agent': 'Python-urllib/2.7'})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/mhawke/virtualenvs/py2/lib/python2.7/site-packages/requests/api.py", line 69, in get
return request('get', url, params=params, **kwargs)
File "/home/mhawke/virtualenvs/py2/lib/python2.7/site-packages/requests/api.py", line 50, in request
response = session.request(method=method, url=url, **kwargs)
File "/home/mhawke/virtualenvs/py2/lib/python2.7/site-packages/requests/sessions.py", line 465, in request
resp = self.send(prep, **send_kwargs)
File "/home/mhawke/virtualenvs/py2/lib/python2.7/site-packages/requests/sessions.py", line 594, in send
history = [resp for resp in gen] if allow_redirects else []
File "/home/mhawke/virtualenvs/py2/lib/python2.7/site-packages/requests/sessions.py", line 196, in resolve_redirects
**adapter_kwargs
File "/home/mhawke/virtualenvs/py2/lib/python2.7/site-packages/requests/sessions.py", line 573, in send
r = adapter.send(request, **kwargs)
File "/home/mhawke/virtualenvs/py2/lib/python2.7/site-packages/requests/adapters.py", line 431, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: EOF occurred in violation of protocol (_ssl.c:590)
我的建议是使用requests
,因为这是一个更好的库,但是,如果必须使用标准库,请使用urllib2
并设置远程服务器可接受的用户代理标头:
req = urllib2.Request('https://www.mci.ir/web/guest/login')
req.add_header('User-Agent','Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:38.0) Gecko/20100101 Firefox/38.0')
r = urllib2.urlopen(req)
html = r.read()
另一件值得注意的事情是,一旦远程服务器收到它不喜欢的请求(例如,使用未接受的用户代理),它就会阻止来自原始IP地址的请求,直到有一段时间没有请求(或者可能是随机期)。
答案 1 :(得分:-1)
我也遇到了同样的问题,我使用python3修复了它。
File "/usr/lib/python2.7/ssl.py", line 830, in do_handshake
self._sslobj.do_handshake()
IOError: [Errno socket error] EOF occurred in violation of protocol (_ssl.c:590)