python httplib.HTTPSConnection没有正确连接(到bugzilla.mozilla.org)?

时间:2014-02-06 10:08:58

标签: python python-2.7 https bugzilla httplib

我想从bugzilla(bugzilla.mozilla.org)获取信息

当我编写如下代码时,

#
import httplib
host = 'bugzilla.mozilla.org'

h = httplib.HTTPSConnection(host)
h.putrequest('GET', 'https://bugzilla.mozilla.org/index.cgi')
h.putheader('Accept', 'application/x-ms-application, image/jpeg, application/xaml+xml, image/gif, image/pjpeg, application/x-ms-xbap, application/x-shockwave-flash, */*')
h.putheader('User-Agent', "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; InfoPath.3)")
h.putheader('Host', host)
h.putheader('Connection', 'Keep-Alive')
h.endheaders()

response = h.getresponse()
print response.read()

服务器始终返回

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved <a href="https://bugzilla.mozilla.org/index.cgi">here</a>.</p>
</body></html>

但是这个代码在其他https服务器上运行正常。 有谁知道我哪里错了?

1 个答案:

答案 0 :(得分:1)

httplib不遵循重定向(301 http代码),您可以使用urrlib2代替:

from urllib2 import Request, urlopen

req = Request('https://bugzilla.mozilla.org/index.cgi')
req.add_header('Accept', 'application/x-ms-application, image/jpeg, application/xaml+xml, image/gif, image/pjpeg, application/x-ms-xbap, application/x-shockwave-flash, */*')
req.add_header('User-Agent', "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; InfoPath.3)")
response = urlopen(req) #NOTE: it doesn't check server's ssl certificate
print(response.headers)
content = response.read()