使用Python阅读网页源代码时出错

时间:2015-06-16 04:45:11

标签: python https

我是Python的新手,我一直在尝试获取页面的源代码,并在Python 2和3上尝试了几种方法(这里是一个)

import urllib

url = "https://www.google.ca/?gfe_rd=cr&ei=u6d_VbzoMaei8wfE1oHgBw&gws_rd=ssl#q=test"
f = urllib.urlopen(url)
source = f.read()
print source

但我一直收到以下错误:

Traceback (most recent call last):
  File "C:\Python34\openpage.py", line 4, in <module>
    f = urllib.urlopen(url)
  File "C:\Python27\lib\urllib.py", line 87, in urlopen
    return opener.open(url)
  File "C:\Python27\lib\urllib.py", line 213, in open
    return getattr(self, name)(url)
  File "C:\Python27\lib\urllib.py", line 443, in open_https
    h.endheaders(data)
  File "C:\Python27\lib\httplib.py", line 1049, in endheaders
    self._send_output(message_body)
  File "C:\Python27\lib\httplib.py", line 893, in _send_output
    self.send(msg)
  File "C:\Python27\lib\httplib.py", line 855, in send
    self.connect()
  File "C:\Python27\lib\httplib.py", line 1274, in connect
    server_hostname=server_hostname)
  File "C:\Python27\lib\ssl.py", line 352, in wrap_socket
    _context=self)
  File "C:\Python27\lib\ssl.py", line 579, in __init__
    self.do_handshake()
  File "C:\Python27\lib\ssl.py", line 808, in do_handshake
    self._sslobj.do_handshake()
IOError: [Errno socket error] [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:590)

最后一行表明错误来自安全搜索,但我似乎无法找到解决方法。

我查看了this帖子,但仍未成功。

2 个答案:

答案 0 :(得分:2)

您正在使用https,这是一种安全协议。它说

  

SSL:CERTIFICATE_VERIFY_FAILED

尝试使用http或使用ssl https://docs.python.org/2/library/ssl.html

url = "http://www.google.ca

答案 1 :(得分:1)

以下是您可以使用urlparse

在Python3上尝试的示例代码
import http.client
from urllib.parse import urlparse
url = "https://www.google.ca/?gfe_rd=cr&ei=u6d_VbzoMaei8wfE1oHgBw&gws_rd=ssl#q=test"
p = urlparse(url)
conn = http.client.HTTPConnection(p.netloc)
conn.request('GET', p.path)
resp = conn.getresponse()
print('resp= {}'.format(resp.read()))

但是,它将根据您对conn.request()函数的参数起作用。您可以尝试其他方法类型,例如HEAD,您的响应也会相应更改。

如果您想测试您的请求是否有效,您可以尝试:

print(resp.status)

在这种情况下,它会给出200。状态代码列表可用here

也可以找到其他一些examples