我是Python的新手,我一直在尝试获取页面的源代码,并在Python 2和3上尝试了几种方法(这里是一个)
import urllib
url = "https://www.google.ca/?gfe_rd=cr&ei=u6d_VbzoMaei8wfE1oHgBw&gws_rd=ssl#q=test"
f = urllib.urlopen(url)
source = f.read()
print source
但我一直收到以下错误:
Traceback (most recent call last):
File "C:\Python34\openpage.py", line 4, in <module>
f = urllib.urlopen(url)
File "C:\Python27\lib\urllib.py", line 87, in urlopen
return opener.open(url)
File "C:\Python27\lib\urllib.py", line 213, in open
return getattr(self, name)(url)
File "C:\Python27\lib\urllib.py", line 443, in open_https
h.endheaders(data)
File "C:\Python27\lib\httplib.py", line 1049, in endheaders
self._send_output(message_body)
File "C:\Python27\lib\httplib.py", line 893, in _send_output
self.send(msg)
File "C:\Python27\lib\httplib.py", line 855, in send
self.connect()
File "C:\Python27\lib\httplib.py", line 1274, in connect
server_hostname=server_hostname)
File "C:\Python27\lib\ssl.py", line 352, in wrap_socket
_context=self)
File "C:\Python27\lib\ssl.py", line 579, in __init__
self.do_handshake()
File "C:\Python27\lib\ssl.py", line 808, in do_handshake
self._sslobj.do_handshake()
IOError: [Errno socket error] [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:590)
最后一行表明错误来自安全搜索,但我似乎无法找到解决方法。
我查看了this帖子,但仍未成功。
答案 0 :(得分:2)
您正在使用https,这是一种安全协议。它说
SSL:CERTIFICATE_VERIFY_FAILED
尝试使用http或使用ssl https://docs.python.org/2/library/ssl.html
url = "http://www.google.ca
答案 1 :(得分:1)
以下是您可以使用urlparse
import http.client
from urllib.parse import urlparse
url = "https://www.google.ca/?gfe_rd=cr&ei=u6d_VbzoMaei8wfE1oHgBw&gws_rd=ssl#q=test"
p = urlparse(url)
conn = http.client.HTTPConnection(p.netloc)
conn.request('GET', p.path)
resp = conn.getresponse()
print('resp= {}'.format(resp.read()))
但是,它将根据您对conn.request()
函数的参数起作用。您可以尝试其他方法类型,例如HEAD
,您的响应也会相应更改。
如果您想测试您的请求是否有效,您可以尝试:
print(resp.status)
在这种情况下,它会给出200
。状态代码列表可用here
也可以找到其他一些examples。