我想从bugzilla(bugzilla.mozilla.org)获取信息
当我编写如下代码时,
#
import httplib
host = 'bugzilla.mozilla.org'
h = httplib.HTTPSConnection(host)
h.putrequest('GET', 'https://bugzilla.mozilla.org/index.cgi')
h.putheader('Accept', 'application/x-ms-application, image/jpeg, application/xaml+xml, image/gif, image/pjpeg, application/x-ms-xbap, application/x-shockwave-flash, */*')
h.putheader('User-Agent', "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; InfoPath.3)")
h.putheader('Host', host)
h.putheader('Connection', 'Keep-Alive')
h.endheaders()
response = h.getresponse()
print response.read()
服务器始终返回
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved <a href="https://bugzilla.mozilla.org/index.cgi">here</a>.</p>
</body></html>
但是这个代码在其他https服务器上运行正常。 有谁知道我哪里错了?
答案 0 :(得分:1)
httplib
不遵循重定向(301 http代码),您可以使用urrlib2
代替:
from urllib2 import Request, urlopen
req = Request('https://bugzilla.mozilla.org/index.cgi')
req.add_header('Accept', 'application/x-ms-application, image/jpeg, application/xaml+xml, image/gif, image/pjpeg, application/x-ms-xbap, application/x-shockwave-flash, */*')
req.add_header('User-Agent', "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; InfoPath.3)")
response = urlopen(req) #NOTE: it doesn't check server's ssl certificate
print(response.headers)
content = response.read()