我试图从tar文件中提取压缩文件中的一些xml文件。实际上,有一个大的tar文件,其中有多个“zip”文件。每个zipfile包含另一个带有xml文件的zipfile。
import tarfile, os
import sys
from zipfile import ZipFile
os.chdir("C://.../temp/foo")
tar = tarfile.open("C://....")
for member in tar.getmembers():
if member.name.endswith(".zip"):
f=tar.extractfile(member)
content=ZipFile(f, 'r')
content = content.extract(content)
tar.close()
上面的脚本不提取好文件
答案 0 :(得分:1)
你必须做一些操作才能将第二个ZipFile放入ZipFile而不是ZipExtFile,就像你会注意到的那样。
关键是第二级ZipFile必须从字节流对象中实例化,然后才能正常运行。我用你的规范创建了一个测试文件,它工作得很好(tar - zip - zip - textfile),如果你有更深层次的zipfile嵌套,你可以更多地概括代码。
import tarfile
from zipfile import ZipFile
import io
mytar = tarfile.open('mytar.tar')
print('Opening tar file, members:')
for member in mytar.getnames():
print('>%s'%member)
if member.endswith('zip'):
# get the tarfile object
tf = mytar.extractfile(member)
# this is what the first-lelvel ZipFile will be
with ZipFile(tf) as myzip1:
print(myzip1.namelist())
# now let's get at those second-level ZipFiles, which currently exist as ZipExtFile
for zipfile2name in myzip1.namelist():
# read the file into bytes
zipfile2bytes = myzip1.read(zipfile2name)
# get a bytestream
f = io.BytesIO(zipfile2bytes)
# now instantiate a ZipFile Object
zipfile2 = ZipFile(f)
# now we can use it like a proper ZipFile
print(zipfile2.namelist())
for textfile in zipfile2.namelist():
with zipfile2.open(textfile) as myfile:
print(myfile.read())
print('--finished--')
mytar.close()