Question

我想将bz2 url中的数据直接解压缩到目标文件。这是代码：

filename = 'temp.file'  
req = urllib2.urlopen('http://example.com/file.bz2')
CHUNK = 16 * 1024
with open(filename, 'wb') as fp:
  while True:
    chunk = req.read(CHUNK)
    if not chunk: break
    fp.write(bz2.decompress(chunk)) 
fp.close()

bz2.decompress（chunk）出错 - ValueError：无法找到流的结尾

Answer 1

使用bz2.BZ2Decompressor执行顺序解压缩：

filename = 'temp.file'  
req = urllib2.urlopen('http://example.com/file.bz2')
CHUNK = 16 * 1024

decompressor = bz2.BZ2Decompressor()
with open(filename, 'wb') as fp:
    while True:
        chunk = req.read(CHUNK)
        if not chunk:
            break
        fp.write(decompressor.decompress(chunk))
req.close()

顺便说一句，只要您使用fp.close()声明，就不需要致电with。

Answer 2

您应该使用支持增量解压缩的BZ2Decompressor。见https://docs.python.org/2/library/bz2.html#bz2.BZ2Decompressor

我还没有调试过这个，但它应该像这样工作：

filename = 'temp.file'  
req = urllib2.urlopen('http://example.com/file.bz2')
CHUNK = 16 * 1024

decompressor = bz.BZ2Decompressor()

with open(filename, 'wb') as fp:
  while True:
    chunk = req.read(CHUNK)
    if not chunk: break

    decomp = decompressor.decompress(chunk)
    if decomp:
        fp.write(decomp)

Answer 3

这是在流式传输模式下使用requests的更直接有效的方法：

req = requests.get('http://example.com/file.bz2', stream=True)
with open(filename, 'wb') as fp:
    shutil.copyfileobj(req.raw, fp)

在python中解压缩没有临时文件的bz2 url

3 个答案: