异常OSError:尝试并行解压缩bz2块时,数据流无效

时间:2019-06-30 21:50:14

标签: python python-3.x multiprocessing bz2

我正在尝试并行地从URL块中读取非常大的bz2文件,并分别解压缩每个块。当我尝试在流程工作者函数外部解压缩块时,它可以正常工作。但是,当子进程尝试解压缩相同的块时,它将抛出OSError: Invalid data stream异常。

下面的代码是完整的代码。我正在运行Python 3.5.2。

import bz2
import urllib3
import multiprocessing as mp

def parse_chunk():
    decompressor = bz2.BZ2Decompressor()
    global q
    while True:
        chunk = q.get()
        if chunk is None:
            break
        # Decompression here fails
        decompressed_chunk = decompressor.decompress(chunk).decode("utf-8")

decompressor_main = bz2.BZ2Decompressor()
http = urllib3.PoolManager()
r = http.request(
     'GET',
     'https://url_to_file.bz2',
     preload_content=False)
last_line = False

q = mp.Queue(maxsize=5)
pool = mp.Pool(5, initializer=parse_chunk)
for chunk in r.stream(1024*100):
    # Decompression here works
    decompressed_chunk = decompressor_main.decompress(chunk).decode("utf-8")
    q.put(chunk)
q.put(None)

0 个答案:

没有答案