Python从zipfile中删除条目

时间:2016-04-07 09:58:13

标签: python python-2.7 zipfile

我目前正在为容器格式编写一个开源库,其中涉及修改zip存档。因此我使用pythons内置zipfile模块。由于一些限制,我决定修改模块并将其与我的库一起发送。这些修改包括一个用于从python问题跟踪器中删除zip文件中的条目的补丁:https://bugs.python.org/issue6818 更具体地说,我包括来自ubershmekel的zipfile.remove.2.patch。 在对Python-2.7进行一些修改之后,根据发布的单元测试,补丁工作正常。

但是,当我删除,添加和删除+添加文件而不关闭之间的zip文件时,我遇到了一些问题。

Error
Traceback (most recent call last):
  File "/home/martin/git/pyCombineArchive/tests/test_zipfile.py", line 1590, in test_delete_add_no_close
    self.assertEqual(zf.read(fname), data)
  File "/home/martin/git/pyCombineArchive/combinearchive/custom_zip.py", line 948, in read
    with self.open(name, "r", pwd) as fp:
  File "/home/martin/git/pyCombineArchive/combinearchive/custom_zip.py", line 1003, in open
    % (zinfo.orig_filename, fname))
BadZipFile: File name in directory 'foo.txt' and header 'bar.txt' differ.

这意味着zip文件没问题,但不知何故中央词典/条目标题搞砸了。 此unittest再现此错误:

def test_delete_add_no_close(self):
    fname_list = ["foo.txt", "bar.txt", "blu.bla", "sup.bro", "rollah"]
    data_list = [''.join([chr(randint(0, 255)) for i in range(100)]) for i in range(len(fname_list))]

    # add some files to the zip
    with zipfile.ZipFile(TESTFN, "w") as zf:
        for fname, data in zip(fname_list, data_list):
            zf.writestr(fname, data)

    for no in range(0, 2):
        with zipfile.ZipFile(TESTFN, "a") as zf:
            zf.remove(fname_list[no])
            zf.writestr(fname_list[no], data_list[no])
            zf.remove(fname_list[no+1])
            zf.writestr(fname_list[no+1], data_list[no+1])

            # try to access prior deleted/added file and prior last file (which got moved, while delete)
            for fname, data in zip(fname_list, data_list):
                self.assertEqual(zf.read(fname), data)

我的修改后的zipfile模块和完整的unittest文件可以在这个要点中找到:https://gist.github.com/FreakyBytes/30a6f9866154d82f1c3863f2e4969cc4

1 个答案:

答案 0 :(得分:1)

经过一些密集的调试后,我很确定移动剩余的块会出现问题。 (存储在删除文件之后的那些)所以我继续重写了这个代码部分,所以它一次复制这些文件/块。我还重写了每个文件头(以确保它是有效的)和zipfile末尾的中心目录。 我的删除功能现在看起来像这样:

def remove(self, member):
    """Remove a file from the archive. Only works if the ZipFile was opened
    with mode 'a'."""

    if "a" not in self.mode:
        raise RuntimeError('remove() requires mode "a"')
    if not self.fp:
        raise RuntimeError(
              "Attempt to modify ZIP archive that was already closed")
    fp = self.fp

    # Make sure we have an info object
    if isinstance(member, ZipInfo):
        # 'member' is already an info object
        zinfo = member
    else:
        # Get info object for member
        zinfo = self.getinfo(member)

    # start at the pos of the first member (smallest offset)
    position = min([info.header_offset for info in self.filelist])  # start at the beginning of first file
    for info in self.filelist:
        fileheader = info.FileHeader()
        # is member after delete one?
        if info.header_offset > zinfo.header_offset and info != zinfo:
            # rewrite FileHeader and copy compressed data
            # Skip the file header:
            fp.seek(info.header_offset)
            fheader = fp.read(sizeFileHeader)
            if fheader[0:4] != stringFileHeader:
                raise BadZipFile("Bad magic number for file header")

            fheader = struct.unpack(structFileHeader, fheader)
            fname = fp.read(fheader[_FH_FILENAME_LENGTH])
            if fheader[_FH_EXTRA_FIELD_LENGTH]:
                fp.read(fheader[_FH_EXTRA_FIELD_LENGTH])

            if zinfo.flag_bits & 0x800:
                # UTF-8 filename
                fname_str = fname.decode("utf-8")
            else:
                fname_str = fname.decode("cp437")

            if fname_str != info.orig_filename:
                if not self._filePassed:
                    fp.close()
                raise BadZipFile(
                      'File name in directory %r and header %r differ.'
                      % (zinfo.orig_filename, fname))

            # read the actual data
            data = fp.read(fheader[_FH_COMPRESSED_SIZE])

            # modify info obj
            info.header_offset = position
            # jump to new position
            fp.seek(info.header_offset, 0)
            # write fileheader and data
            fp.write(fileheader)
            fp.write(data)
            if zinfo.flag_bits & _FHF_HAS_DATA_DESCRIPTOR:
                # Write CRC and file sizes after the file data
                fp.write(struct.pack("<LLL", info.CRC, info.compress_size,
                        info.file_size))
            # update position
            fp.flush()
            position = fp.tell()

        elif info != zinfo:
            # move to next position
            position = position + info.compress_size + len(fileheader) + self._get_data_descriptor_size(info)

    # Fix class members with state
    self.start_dir = position
    self._didModify = True
    self.filelist.remove(zinfo)
    del self.NameToInfo[zinfo.filename]

    # write new central directory (includes truncate)
    fp.seek(position, 0)
    self._write_central_dir()
    fp.seek(self.start_dir, 0)  # jump to the beginning of the central directory, so it gets overridden at close()

您可以在要点的最新版本中找到完整的代码:https://gist.github.com/FreakyBytes/30a6f9866154d82f1c3863f2e4969cc4

或在图书馆的回购中我写作:https://github.com/FreakyBytes/pyCombineArchive