使用Python 3.7中的zipfile来处理串联的zip文件的“文件标头错误的魔术数字”

时间:2019-01-09 04:54:54

标签: python zip zipfile

我正在尝试使用Python zipfile库来解压缩拆分的ZIP文件,方法是将所有文件拆分连接在一起,然后解压缩最终产品,但是使用此库,我一直遇到“文件标头错误的魔术数字”错误

我正在编写一个Python脚本,该脚本通常会接收一个ZIP文件,但是很少会收到一个分为多个部分的ZIP文件(例如foo.zip.001,foo.zip.002等)。据我所知,如果您需要将脚本与Docker容器的依赖项捆绑在一起,则没有简单的方法来处理此问题。但是,我偶然发现了this SO answer,这说明您可以将文件串联到单个ZIP文件中,并以此进行处理。因此,我的战斗计划是将所有文件拆分合并为一个大ZIP文件,然后将其解压缩。我使用以下命令使用视频文件创建了一个测试用例(使用Mac终端):

$ zip -s 5m test ch4_3.mp4

这是将所有文件连接在一起的代码:

import zipfile

split_files = ['test.z01', 'test.z02', 'test.z03', 'test.zip']

with open('test_video.zip', 'wb') as f:
    for file in split_files:
        with open(file, 'rb') as zf:
            f.write(zf.read())

如果我去终端并运行unzip test_video.zip,则输出为:

$ unzip test_video.zip
Archive:  test_video.zip
warning [test_video.zip]:  zipfile claims to be last disk of a multi-part archive;
  attempting to process anyway, assuming all parts have been concatenated
  together in order.  Expect "errors" and warnings...true multi-part support
  doesn't exist yet (coming soon).
warning [test_video.zip]:  15728640 extra bytes at beginning or within zipfile
  (attempting to process anyway)
file #1:  bad zipfile offset (local header sig):  15728644
  (attempting to re-compensate)
  inflating: ch4_3.mp4

似乎有点困难,但是成功了。但是,当我尝试运行以下代码时:

if not os.path.exists('output'):
    os.mkdir('output')
with zipfile.ZipFile('tester/test_video.zip', 'r') as z:
    z.extractall('output')

我收到以下错误:

---------------------------------------------------------------------------
BadZipFile                                Traceback (most recent call last)
<ipython-input-60-07a6f56ea685> in <module>()
      2     os.mkdir('output')
      3 with zipfile.ZipFile('tester/test_video.zip', 'r') as z:
----> 4     z.extractall('output')

~/anaconda3/lib/python3.6/zipfile.py in extractall(self, path, members, pwd)
   1499 
   1500         for zipinfo in members:
-> 1501             self._extract_member(zipinfo, path, pwd)
   1502 
   1503     @classmethod

~/anaconda3/lib/python3.6/zipfile.py in _extract_member(self, member, targetpath, pwd)
   1552             return targetpath
   1553 
-> 1554         with self.open(member, pwd=pwd) as source,    1555              open(targetpath, "wb") as target:
   1556             shutil.copyfileobj(source, target)

~/anaconda3/lib/python3.6/zipfile.py in open(self, name, mode, pwd, force_zip64)
   1371             fheader = struct.unpack(structFileHeader, fheader)
   1372             if fheader[_FH_SIGNATURE] != stringFileHeader:
-> 1373                 raise BadZipFile("Bad magic number for file header")
   1374 
   1375             fname = zef_file.read(fheader[_FH_FILENAME_LENGTH])

BadZipFile: Bad magic number for file header

如果我尝试先使用.zip文件运行它,这就是我得到的:

split_files = ['test.zip', 'test.z01', 'test.z02', 'test.z03']

with open('test_video.zip', 'wb') as f:
    for file in split_files:
        with open(file, 'rb') as zf:
            f.write(zf.read())

with zipfile.ZipFile('test_video.zip', 'r') as z:
    z.extractall('output')

以下是输出:

---------------------------------------------------------------------------
BadZipFile                                Traceback (most recent call last)
<ipython-input-14-f7aab706dbed> in <module>()
      1 if not os.path.exists('output'):
      2     os.mkdir('output')
----> 3 with zipfile.ZipFile('test_video.zip', 'r') as z:
      4     z.extractall('output')

~/anaconda3/lib/python3.6/zipfile.py in __init__(self, file, mode, compression, allowZip64)
   1106         try:
   1107             if mode == 'r':
-> 1108                 self._RealGetContents()
   1109             elif mode in ('w', 'x'):
   1110                 # set the modified flag so central directory gets written

~/anaconda3/lib/python3.6/zipfile.py in _RealGetContents(self)
   1173             raise BadZipFile("File is not a zip file")
   1174         if not endrec:
-> 1175             raise BadZipFile("File is not a zip file")
   1176         if self.debug > 1:
   1177             print(endrec)

BadZipFile: File is not a zip file

使用this SO question的答案,我得出的标题是b'PK\x07\x08',但我不知道为什么。我还使用了testzip()函数,它直接指向罪魁祸首:ch4_3.mp4

您可以在this link here上找到有问题的ZIP文件。有什么想法做什么?

0 个答案:

没有答案