读取tar文件中的zip文件

时间:2018-01-08 14:57:13

标签: python zip tar

我试图从tar文件中提取压缩文件中的一些xml文件。实际上,有一个大的tar文件,其中有多个“zip”文件。每个zipfile包含另一个带有xml文件的zipfile。

import tarfile, os
import sys
from zipfile import ZipFile

os.chdir("C://.../temp/foo")
tar = tarfile.open("C://....")
for member in tar.getmembers():
    if member.name.endswith(".zip"):
        f=tar.extractfile(member)
        content=ZipFile(f, 'r')
        content = content.extract(content)
        tar.close()

上面的脚本不提取好文件

1 个答案:

答案 0 :(得分:1)

你必须做一些操作才能将第二个ZipFile放入ZipFile而不是ZipExtFile,就像你会注意到的那样。

关键是第二级ZipFile必须从字节流对象中实例化,然后才能正常运行。我用你的规范创建了一个测试文件,它工作得很好(tar - zip - zip - textfile),如果你有更深层次的zipfile嵌套,你可以更多地概括代码。

import tarfile
from zipfile import ZipFile
import io

mytar = tarfile.open('mytar.tar')
print('Opening tar file, members:')
for member in mytar.getnames():
    print('>%s'%member)
    if member.endswith('zip'):
        # get the tarfile object
        tf = mytar.extractfile(member)
        # this is what the first-lelvel ZipFile will be
        with ZipFile(tf) as myzip1:
            print(myzip1.namelist())
            # now let's get at those second-level ZipFiles, which currently exist as ZipExtFile 
            for zipfile2name in myzip1.namelist():
                # read the file into bytes
                zipfile2bytes = myzip1.read(zipfile2name)
                # get a bytestream
                f = io.BytesIO(zipfile2bytes)
                # now instantiate a ZipFile Object
                zipfile2 = ZipFile(f)
                # now we can use it like a proper ZipFile
                print(zipfile2.namelist())
                for textfile in zipfile2.namelist():
                    with zipfile2.open(textfile) as myfile:
                        print(myfile.read())


print('--finished--')
mytar.close()