我正在尝试使用Python zipfile库来解压缩拆分的ZIP文件,方法是将所有文件拆分连接在一起,然后解压缩最终产品,但是使用此库,我一直遇到“文件标头错误的魔术数字”错误
我正在编写一个Python脚本,该脚本通常会接收一个ZIP文件,但是很少会收到一个分为多个部分的ZIP文件(例如foo.zip.001,foo.zip.002等)。据我所知,如果您需要将脚本与Docker容器的依赖项捆绑在一起,则没有简单的方法来处理此问题。但是,我偶然发现了this SO answer,这说明您可以将文件串联到单个ZIP文件中,并以此进行处理。因此,我的战斗计划是将所有文件拆分合并为一个大ZIP文件,然后将其解压缩。我使用以下命令使用视频文件创建了一个测试用例(使用Mac终端):
$ zip -s 5m test ch4_3.mp4
这是将所有文件连接在一起的代码:
import zipfile
split_files = ['test.z01', 'test.z02', 'test.z03', 'test.zip']
with open('test_video.zip', 'wb') as f:
for file in split_files:
with open(file, 'rb') as zf:
f.write(zf.read())
如果我去终端并运行unzip test_video.zip
,则输出为:
$ unzip test_video.zip
Archive: test_video.zip
warning [test_video.zip]: zipfile claims to be last disk of a multi-part archive;
attempting to process anyway, assuming all parts have been concatenated
together in order. Expect "errors" and warnings...true multi-part support
doesn't exist yet (coming soon).
warning [test_video.zip]: 15728640 extra bytes at beginning or within zipfile
(attempting to process anyway)
file #1: bad zipfile offset (local header sig): 15728644
(attempting to re-compensate)
inflating: ch4_3.mp4
似乎有点困难,但是成功了。但是,当我尝试运行以下代码时:
if not os.path.exists('output'):
os.mkdir('output')
with zipfile.ZipFile('tester/test_video.zip', 'r') as z:
z.extractall('output')
我收到以下错误:
---------------------------------------------------------------------------
BadZipFile Traceback (most recent call last)
<ipython-input-60-07a6f56ea685> in <module>()
2 os.mkdir('output')
3 with zipfile.ZipFile('tester/test_video.zip', 'r') as z:
----> 4 z.extractall('output')
~/anaconda3/lib/python3.6/zipfile.py in extractall(self, path, members, pwd)
1499
1500 for zipinfo in members:
-> 1501 self._extract_member(zipinfo, path, pwd)
1502
1503 @classmethod
~/anaconda3/lib/python3.6/zipfile.py in _extract_member(self, member, targetpath, pwd)
1552 return targetpath
1553
-> 1554 with self.open(member, pwd=pwd) as source, 1555 open(targetpath, "wb") as target:
1556 shutil.copyfileobj(source, target)
~/anaconda3/lib/python3.6/zipfile.py in open(self, name, mode, pwd, force_zip64)
1371 fheader = struct.unpack(structFileHeader, fheader)
1372 if fheader[_FH_SIGNATURE] != stringFileHeader:
-> 1373 raise BadZipFile("Bad magic number for file header")
1374
1375 fname = zef_file.read(fheader[_FH_FILENAME_LENGTH])
BadZipFile: Bad magic number for file header
如果我尝试先使用.zip文件运行它,这就是我得到的:
split_files = ['test.zip', 'test.z01', 'test.z02', 'test.z03']
with open('test_video.zip', 'wb') as f:
for file in split_files:
with open(file, 'rb') as zf:
f.write(zf.read())
with zipfile.ZipFile('test_video.zip', 'r') as z:
z.extractall('output')
以下是输出:
---------------------------------------------------------------------------
BadZipFile Traceback (most recent call last)
<ipython-input-14-f7aab706dbed> in <module>()
1 if not os.path.exists('output'):
2 os.mkdir('output')
----> 3 with zipfile.ZipFile('test_video.zip', 'r') as z:
4 z.extractall('output')
~/anaconda3/lib/python3.6/zipfile.py in __init__(self, file, mode, compression, allowZip64)
1106 try:
1107 if mode == 'r':
-> 1108 self._RealGetContents()
1109 elif mode in ('w', 'x'):
1110 # set the modified flag so central directory gets written
~/anaconda3/lib/python3.6/zipfile.py in _RealGetContents(self)
1173 raise BadZipFile("File is not a zip file")
1174 if not endrec:
-> 1175 raise BadZipFile("File is not a zip file")
1176 if self.debug > 1:
1177 print(endrec)
BadZipFile: File is not a zip file
使用this SO question的答案,我得出的标题是b'PK\x07\x08'
,但我不知道为什么。我还使用了testzip()
函数,它直接指向罪魁祸首:ch4_3.mp4
。
您可以在this link here上找到有问题的ZIP文件。有什么想法做什么?