我怎样才能获得此页面的内容?

时间:2014-10-16 12:40:55

标签: python sockets urllib

我的代码是:

import urllib
response = urllib.urlopen("https://namepal.com/").read()
print response

我想获取此页面的内容,但它会引发异常:

Traceback (most recent call last):
  File "C:/python27/tcl.py", line 3, in <module>
response = urllib.urlopen("https://namepal.com/").read()
  File "C:\python27\lib\urllib.py", line 84, in urlopen
    return opener.open(url)
  File "C:\python27\lib\urllib.py", line 205, in open
    return getattr(self, name)(url)
  File "C:\python27\lib\urllib.py", line 435, in open_https
    h.endheaders(data)
  File "C:\python27\lib\httplib.py", line 951, in endheaders
    self._send_output(message_body)
  File "C:\python27\lib\httplib.py", line 811, in _send_output
    self.send(msg)
  File "C:\python27\lib\httplib.py", line 773, in send
    self.connect()
  File "C:\python27\lib\httplib.py", line 1158, in connect
    self.sock = ssl.wrap_socket(sock, self.key_file, self.cert_file)
  File "C:\python27\lib\ssl.py", line 372, in wrap_socket
    ciphers=ciphers)
  File "C:\python27\lib\ssl.py", line 134, in __init__
    self.do_handshake()
  File "C:\python27\lib\ssl.py", line 296, in do_handshake
    self._sslobj.do_handshake()
IOError: [Errno socket error] [Errno 10054] 

所以我使用socket来获取它,但它仍然失败:

import socket
import ssl

sock = ssl.wrap_socket(socket.socket())
#sock=socket.socket()
sock.connect(('namepal.com',80))
sock.sendall('GET  / HTTP/1.1\r\n'
             'Host: namepal.com\r\n'
             'User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:32.0) Gecko/20100101  Firefox/32.0\r\n'
             'Connection: keep-alive\r\n'
             '\r\n')
response = sock.recv(4096)

print response

它抛出一个新的异常。

Traceback (most recent call last):
  File "C:\Users\Administrator\Desktop\test.py", line 6, in <module>
    sock.connect(('namepal.com',80))
  File "C:\Python27\lib\ssl.py", line 322, in connect
    self._real_connect(addr, False)
  File "C:\Python27\lib\ssl.py", line 315, in _real_connect
    raise e
SSLError: [Errno 1] _ssl.c:503: error:140770FC:SSL   routines:SSL23_GET_SERVER_HELLO:unknown protocol

我只想获得此页面的内容。

1 个答案:

答案 0 :(得分:1)