下载并解压缩内存中的gzip压缩文件?

时间:2013-03-12 03:16:59

标签: python file gzip urllib2 stringio

我想使用urllib下载文件并在保存之前将文件解压缩到内存中。

这就是我现在所拥有的:

response = urllib2.urlopen(baseURL + filename)
compressedFile = StringIO.StringIO()
compressedFile.write(response.read())
decompressedFile = gzip.GzipFile(fileobj=compressedFile, mode='rb')
outfile = open(outFilePath, 'w')
outfile.write(decompressedFile.read())

这最终会写出空文件。我怎样才能实现我的目标?

更新答案:

#! /usr/bin/env python2
import urllib2
import StringIO
import gzip

baseURL = "https://www.kernel.org/pub/linux/docs/man-pages/"
filename = "man-pages-3.34.tar.gz"
outFilePath = filename[:-3]

response = urllib2.urlopen(baseURL + filename)
compressedFile = StringIO.StringIO(response.read())
decompressedFile = gzip.GzipFile(fileobj=compressedFile)

with open(outFilePath, 'w') as outfile:
    outfile.write(decompressedFile.read())

4 个答案:

答案 0 :(得分:45)

您需要在写入compressedFile之后但在将其传递给gzip.GzipFile()之前进行搜索。否则它将由gzip模块从末尾读取,并将显示为空文件。见下文:

#! /usr/bin/env python
import urllib2
import StringIO
import gzip

baseURL = "https://www.kernel.org/pub/linux/docs/man-pages/"
filename = "man-pages-3.34.tar.gz"
outFilePath = "man-pages-3.34.tar"

response = urllib2.urlopen(baseURL + filename)
compressedFile = StringIO.StringIO()
compressedFile.write(response.read())
#
# Set the file's current position to the beginning
# of the file so that gzip.GzipFile can read
# its contents from the top.
#
compressedFile.seek(0)

decompressedFile = gzip.GzipFile(fileobj=compressedFile, mode='rb')

with open(outFilePath, 'w') as outfile:
    outfile.write(decompressedFile.read())

答案 1 :(得分:12)

对于那些使用Python 3的人来说,等效的答案是:

import urllib.request
import io
import gzip

response = urllib.request.urlopen(FILE_URL)
compressed_file = io.BytesIO(response.read())
decompressed_file = gzip.GzipFile(fileobj=compressed_file)

with open(OUTFILE_PATH, 'wb') as outfile:
    outfile.write(decompressed_file.read())

答案 2 :(得分:9)

如果你有Python 3.2或更高版本,生活会更容易:

#!/usr/bin/env python3
import gzip
import urllib.request

baseURL = "https://www.kernel.org/pub/linux/docs/man-pages/"
filename = "man-pages-4.03.tar.gz"
outFilePath = filename[:-3]

response = urllib.request.urlopen(baseURL + filename)
with open(outFilePath, 'wb') as outfile:
    outfile.write(gzip.decompress(response.read()))

对于对历史感兴趣的人,请参阅https://bugs.python.org/issue3488https://hg.python.org/cpython/rev/3fa0a9553402

答案 3 :(得分:-1)

打印解压缩文件内容的一行代码:

print gzip.GzipFile(fileobj=StringIO.StringIO(urllib2.urlopen(DOWNLOAD_LINK).read()), mode='rb').read()