将urllib2提取的gzip压缩数据转换为HTML

时间:2009-11-09 23:55:19

标签: python gzip urllib2

我目前使用mechanize来阅读gzip压缩的网页,如下所示:

br = mechanize.Browser()
br.set_handle_gzip(True)
response = br.open(url)
data = response.read()

我想知道如何将urllib2提取的gzip压缩数据解压缩为HTML文本?

req = urllib2.Request(url)
opener = urllib2.build_opener()
response = opener.open(req)
data = response.read()
if response.info()['content-encoding'] == 'gzip':
    HOW TO DECOMPRESS DATA TO HTML

2 个答案:

答案 0 :(得分:14)

试试这个:

import StringIO
data = StringIO.StringIO(data)
import gzip
gzipper = gzip.GzipFile(fileobj=data)
html = gzipper.read()

html现在应该包含HTML(打印它以查看)。有关详细信息,请参阅here

答案 1 :(得分:-2)

def ungzip(r,b):
    headers = r.info()
    if ('Content-Encoding' in headers.keys() and headers['Content-Encoding']=='gzip') or \
       ('content-encoding' in headers.keys() and headers['content-encoding']=='gzip'):
        import gzip
        gz = gzip.GzipFile(fileobj=r, mode='rb')
        html = gz.read()
        gz.close()
        headers['Content-type'] = 'text/html; charset=utf-8'
        r.set_data(html)
        b.set_response(r)