Question

我有一个包含单个xml文件的多GB 7z存档。我想一次从这个压缩文件中读取一行，直到它在Python 3.4上达到（文件）EOF。我不能把它解压缩成它的全尺寸，大约是几TB的。

我被建议使用许多类似pylzma和lzma的库，但它们不支持7z格式。我认为libarchive确实支持7z，但是它读取了块，这些块不一定是文件中的文本行。

请提供建议。谢谢。

Answer 1

（详细说明了yield部分）注意，我不知道这个lib或者你用什么函数来获取未压缩数据块。但我的意思是这样的：

def 7zreadline(filename):
    with open(filename, 'rb') as fh: #automatically closes filehandler when finished
        archive = py7zlib.Archive7z(fh)
        current_line = ''
        for block in archive.getblock(): #I do not know how you get a block of uncompressed data, so I ''abstract'' the call, you get the idea...
            current_line += block
            while '\n' in current_line:
                yield current_line[:current_line.index('\n')+1] # gives all until '\n' to the caller
                current_line = current_line[current_line.index('\n')+1:] # now, initialize current_line with the rest of your block.
        yield current_line #return the end of file

然后你可以这样使用它：

for line in 7zreadline('myfile.zip'):
    print(line)

如果知道图书馆的人可以得到正确的信息，欢迎编辑。

Python：如何从Python中的压缩7z文件中读取一行？

1 个答案: