Python中的zipfile产生不太普通的ZIP文件

时间:2016-08-18 08:16:47

标签: python python-3.x zipfile

在我的项目中,创建了一组文件并打包到ZIP存档,以便在Android手机上使用。 Android应用程序正在打开这样的ZIP文件,用于读取初始数据,然后将其工作结果存储到相同的ZIP中。我之前无法访问提到的Android应用程序的源代码和生成zip文件的旧脚本(实际上,我不知道创建了多少旧的ZIP文件)。但ZIP存档的结构是已知的,我编写了新的python脚本来制作相同的文件。

我遇到了以下问题:我的脚本生成的ZIP文件无法通过Android应用程序打开(有关错误的文件结构欠款的错误消息),但如果我解压缩所有内容并将其打包回新的ZIP文件同名 WinZIP 7-Zip 或" 发送至 - >压缩(压缩)文件夹" (在Windows 7中)文件通常在手机上处理(这使我得出的结论是问题不在Android应用程序中)。

ZIP中打包文件夹的代码段如下

# make zip
try:
    with zipfile.ZipFile(prefix + '.zip', 'w') as zipf:
        for root, dirs, files in os.walk(prefix):
            for file in files:
                zipf.write(os.path.join(root, file))
    # remove dir, that was packed
    shutil.rmtree(prefix)
    # Report about resulting
    print('File ' + prefix + '.zip was created')
except:
    print('Unexpected error occurred while creating file ' + prefix + '.zip')

在我注意到文件未压缩后,我添加了压缩选项:

 zipfile.ZipFile(prefix + '.zip', 'w', zipfile.ZIP_DEFLATED) 

但这并没有解决我的问题,为True设置allowZip64值也没有改变这种情况。

顺便说一下,使用zipfile.ZIP_DEFLATED生成的ZIP文件比Windows生成的ZIP文件小约5千字节,比同一存档内容的7-Zip结果小约14千字节。同时我可以通过7-Zip和Windows资源管理器打开所有这些ZIP文件进行视觉比较。

所以我有三个相关的问题:

1)什么可能会导致我的脚本出现zipfile这种奇怪的行为?

2)我如何影响zipfile

3)如何检查使用zipfile创建的ZIP文件以查找可能的结构问题或确保没有问题?

当然,如果我不得不放弃使用zipfile,我可以使用外部存档(例如7-zip)进行文件打包,但如果存在,我想找到一个优雅的解决方案。

更新

为了检查用zipfile创建的ZIP文件的内容,我做了以下

# make zip
flist = []
try:
    with zipfile.ZipFile(prefix + '.zip', 'w', zipfile.ZIP_DEFLATED) as zipf:
        for root, dirs, files in os.walk(prefix):
            for file in files:
                zipf.write(os.path.join(root, file))
                # Store item in the list
                flist.append(os.path.join(root, file).replace("\\","/"))
    # remove dir, that was packed
    shutil.rmtree(prefix)
    # Report about resulting
    print('File ' + prefix + '.zip was created')
except:
    print('Unexpected error occurred while creating file ' + prefix + '.zip')
# Check of zip
with closing(zipfile.ZipFile(prefix + '.zip')) as zfile:
    for info in zfile.infolist():
        print(info.filename + \
              '  (extra = ' + str(info.extra) + \
              '; compress_type = ' + ('ZIP_DEFLATED' if info.compress_type == zipfile.ZIP_DEFLATED else 'NOT ZIP_DEFLATED')  + \
              ')')
        # remove item from list
        if info.filename in flist:
            flist.remove(info.filename)
        else:
            print(info.filename + ' is unexpected item')
print('Number of items that were missed:')
print(len(flist))

在输出中看到以下结果:

File en_US_00001.zip was created
en_US_00001/en_US_00001_0001/en_US_00001_0001_big.png  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0001/en_US_00001_0001_info.xml  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0001/en_US_00001_0001_small.png  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0001/en_US_00001_0001_source.pkl  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0001/en_US_00001_0001_source.tex  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0001/en_US_00001_0001_user.png  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0002/en_US_00001_0002_big.png  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0002/en_US_00001_0002_info.xml  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0002/en_US_00001_0002_small.png  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0002/en_US_00001_0002_source.pkl  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0002/en_US_00001_0002_source.tex  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0002/en_US_00001_0002_user.png  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0003/en_US_00001_0003_big.png  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0003/en_US_00001_0003_info.xml  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0003/en_US_00001_0003_small.png  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0003/en_US_00001_0003_source.pkl  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0003/en_US_00001_0003_source.tex  (extra = b''; compress_type = ZIP_DEFLATED)
en_US_00001/en_US_00001_0003/en_US_00001_0003_user.png  (extra = b''; compress_type = ZIP_DEFLATED)
Number of items that were missed:
0

因此,所有写的都被阅读了,但问题仍然存在 - 如果所有必要的内容都写完了吗?例如。在评论中哈罗德谈到相对路径......或许,这是答案的关键

更新2

当我使用外部 7-Zip 代码

替换zipfile
# make zip
subprocess.call(["7z.exe","a",prefix + ".zip", prefix])
shutil.rmtree(prefix)
# Check of zip
with closing(zipfile.ZipFile(prefix + '.zip')) as zfile:
    for info in zfile.infolist():
        print(info.filename)
        print('  (extra = ' + str(info.extra) + '; compress_type = ' + str(info.compress_type) + ')')
print('Values for compress_type:')
print(str(zipfile.ZIP_DEFLATED) + ' = ZIP_DEFLATED')
print(str(zipfile.ZIP_STORED) + ' = ZIP_STORED')

产生以下结果

Creating archive en_US_00001.zip

Compressing  en_US_00001\en_US_00001_0001\en_US_00001_0001_big.png
Compressing  en_US_00001\en_US_00001_0001\en_US_00001_0001_info.xml
Compressing  en_US_00001\en_US_00001_0001\en_US_00001_0001_small.png
Compressing  en_US_00001\en_US_00001_0001\en_US_00001_0001_source.pkl
Compressing  en_US_00001\en_US_00001_0001\en_US_00001_0001_source.tex
Compressing  en_US_00001\en_US_00001_0001\en_US_00001_0001_user.png
Compressing  en_US_00001\en_US_00001_0002\en_US_00001_0002_big.png
Compressing  en_US_00001\en_US_00001_0002\en_US_00001_0002_info.xml
Compressing  en_US_00001\en_US_00001_0002\en_US_00001_0002_small.png
Compressing  en_US_00001\en_US_00001_0002\en_US_00001_0002_source.pkl
Compressing  en_US_00001\en_US_00001_0002\en_US_00001_0002_source.tex
Compressing  en_US_00001\en_US_00001_0002\en_US_00001_0002_user.png
Compressing  en_US_00001\en_US_00001_0003\en_US_00001_0003_big.png
Compressing  en_US_00001\en_US_00001_0003\en_US_00001_0003_info.xml
Compressing  en_US_00001\en_US_00001_0003\en_US_00001_0003_small.png
Compressing  en_US_00001\en_US_00001_0003\en_US_00001_0003_source.pkl
Compressing  en_US_00001\en_US_00001_0003\en_US_00001_0003_source.tex
Compressing  en_US_00001\en_US_00001_0003\en_US_00001_0003_user.png

Everything is Ok

en_US_00001/
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00Faf\xd2Y\xf9\xd1\x01Faf\xd2Y\xf9\xd1\x01%\xc9c\xd2Y\xf9\xd1\x01'; compress_type = 0)
en_US_00001/en_US_00001_0001/
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00\xbe(e\xd2Y\xf9\xd1\x01\xbe(e\xd2Y\xf9\xd1\x016\xf0c\xd2Y\xf9\xd1\x01'; compress_type = 0)
en_US_00001/en_US_00001_0001/en_US_00001_0001_big.png
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00G\x17d\xd2Y\xf9\xd1\x01G\x17d\xd2Y\xf9\xd1\x01G\x17d\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0001/en_US_00001_0001_info.xml
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00X>d\xd2Y\xf9\xd1\x01X>d\xd2Y\xf9\xd1\x01X>d\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0001/en_US_00001_0001_small.png
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00z\x8cd\xd2Y\xf9\xd1\x01ied\xd2Y\xf9\xd1\x01ied\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0001/en_US_00001_0001_source.pkl
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00\x8b\xb3d\xd2Y\xf9\xd1\x01\x8b\xb3d\xd2Y\xf9\xd1\x01\x8b\xb3d\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0001/en_US_00001_0001_source.tex
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00\xad\x01e\xd2Y\xf9\xd1\x01\xad\x01e\xd2Y\xf9\xd1\x01\xad\x01e\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0001/en_US_00001_0001_user.png
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00\xbe(e\xd2Y\xf9\xd1\x01\xbe(e\xd2Y\xf9\xd1\x01\xbe(e\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0002/
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x005:f\xd2Y\xf9\xd1\x015:f\xd2Y\xf9\xd1\x01\xcfOe\xd2Y\xf9\xd1\x01'; compress_type = 0)
en_US_00001/en_US_00001_0002/en_US_00001_0002_big.png
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00\xe0ve\xd2Y\xf9\xd1\x01\xcfOe\xd2Y\xf9\xd1\x01\xcfOe\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0002/en_US_00001_0002_info.xml
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00\xf1\x9de\xd2Y\xf9\xd1\x01\xe0ve\xd2Y\xf9\xd1\x01\xe0ve\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0002/en_US_00001_0002_small.png
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00\x02\xc5e\xd2Y\xf9\xd1\x01\x02\xc5e\xd2Y\xf9\xd1\x01\x02\xc5e\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0002/en_US_00001_0002_source.pkl
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00\x13\xece\xd2Y\xf9\xd1\x01\x13\xece\xd2Y\xf9\xd1\x01\x13\xece\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0002/en_US_00001_0002_source.tex
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00$\x13f\xd2Y\xf9\xd1\x01$\x13f\xd2Y\xf9\xd1\x01$\x13f\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0002/en_US_00001_0002_user.png
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x005:f\xd2Y\xf9\xd1\x015:f\xd2Y\xf9\xd1\x015:f\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0003/
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00\xdf\xc0g\xd2Y\xf9\xd1\x01\xdf\xc0g\xd2Y\xf9\xd1\x01Faf\xd2Y\xf9\xd1\x01'; compress_type = 0)
en_US_00001/en_US_00001_0003/en_US_00001_0003_big.png
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00W\x88f\xd2Y\xf9\xd1\x01W\x88f\xd2Y\xf9\xd1\x01W\x88f\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0003/en_US_00001_0003_info.xml
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00h\xaff\xd2Y\xf9\xd1\x01h\xaff\xd2Y\xf9\xd1\x01h\xaff\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0003/en_US_00001_0003_small.png
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00\x9b$g\xd2Y\xf9\xd1\x01y\xd6f\xd2Y\xf9\xd1\x01y\xd6f\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0003/en_US_00001_0003_source.pkl
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00\xacKg\xd2Y\xf9\xd1\x01\xacKg\xd2Y\xf9\xd1\x01\xacKg\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0003/en_US_00001_0003_source.tex
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00\xce\x99g\xd2Y\xf9\xd1\x01\xce\x99g\xd2Y\xf9\xd1\x01\xce\x99g\xd2Y\xf9\xd1\x01'; compress_type = 8)
en_US_00001/en_US_00001_0003/en_US_00001_0003_user.png
    (extra = b'\n\x00 \x00\x00\x00\x00\x00\x01\x00\x18\x00\xdf\xc0g\xd2Y\xf9\xd1\x01\xdf\xc0g\xd2Y\xf9\xd1\x01\xdf\xc0g\xd2Y\xf9\xd1\x01'; compress_type = 8)

Values for compress_type:
8 = ZIP_DEFLATED
0 = ZIP_STORED

据我所知,最重要的发现是:

  • 包含文件夹信息的项目(例如en_US_00001/en_US_00001/en_US_00001_0001/),这些项目不在我使用zipfile生成的ZIP中
  • 文件夹包含compress_type == ZIP_STORED,而文件compress_type == ZIP_DEFLATED
  • extra具有不同的值(生成了很长的字符串)

1 个答案:

答案 0 :(得分:1)

根据问题的更新2和other question about zipfile中的示例中列出的差异,我尝试使用以下代码将目录添加到ZIP文件并检查结果:

BLOBs

输出

# make zip
try:
    with zipfile.ZipFile(prefix + '.zip', 'w', zipfile.ZIP_DEFLATED) as zipf:
        info = zipfile.ZipInfo(prefix+'\\')
        zipf.writestr(info, '')
        for root, dirs, files in os.walk(prefix):
            for d in dirs:
                info = zipfile.ZipInfo(os.path.join(root, d)+'\\')
                zipf.writestr(info, '')
            for file in files:
                zipf.write(os.path.join(root, file))
    # remove dir, that was packed
    shutil.rmtree(prefix)
    # Report about resulting
    print('File ' + prefix + '.zip was created')
except:
    print('Unexpected error occurred while creating file ' + prefix + '.zip')
# Check zip content
with closing(zipfile.ZipFile(prefix + '.zip')) as zfile:
    for info in zfile.infolist():
        print(info.filename)
        print('  (extra = ' + str(info.extra) + '; compress_type = ' + str(info.compress_type) + ')')
print('Values for compress_type:')
print(str(zipfile.ZIP_DEFLATED) + ' = ZIP_DEFLATED')
print(str(zipfile.ZIP_STORED) + ' = ZIP_STORED')

向目录名称添加斜杠(File en_US_00001.zip was created en_US_00001/ (extra = b''; compress_type = 0) en_US_00001/en_US_00001_0001/ (extra = b''; compress_type = 0) en_US_00001/en_US_00001_0002/ (extra = b''; compress_type = 0) en_US_00001/en_US_00001_0003/ (extra = b''; compress_type = 0) en_US_00001/en_US_00001_0001/en_US_00001_0001_big.png (extra = b''; compress_type = 8) en_US_00001/en_US_00001_0001/en_US_00001_0001_info.xml (extra = b''; compress_type = 8) en_US_00001/en_US_00001_0001/en_US_00001_0001_small.png (extra = b''; compress_type = 8) en_US_00001/en_US_00001_0001/en_US_00001_0001_source.pkl (extra = b''; compress_type = 8) en_US_00001/en_US_00001_0001/en_US_00001_0001_source.tex (extra = b''; compress_type = 8) en_US_00001/en_US_00001_0001/en_US_00001_0001_user.png (extra = b''; compress_type = 8) en_US_00001/en_US_00001_0002/en_US_00001_0002_big.png (extra = b''; compress_type = 8) en_US_00001/en_US_00001_0002/en_US_00001_0002_info.xml (extra = b''; compress_type = 8) en_US_00001/en_US_00001_0002/en_US_00001_0002_small.png (extra = b''; compress_type = 8) en_US_00001/en_US_00001_0002/en_US_00001_0002_source.pkl (extra = b''; compress_type = 8) en_US_00001/en_US_00001_0002/en_US_00001_0002_source.tex (extra = b''; compress_type = 8) en_US_00001/en_US_00001_0002/en_US_00001_0002_user.png (extra = b''; compress_type = 8) en_US_00001/en_US_00001_0003/en_US_00001_0003_big.png (extra = b''; compress_type = 8) en_US_00001/en_US_00001_0003/en_US_00001_0003_info.xml (extra = b''; compress_type = 8) en_US_00001/en_US_00001_0003/en_US_00001_0003_small.png (extra = b''; compress_type = 8) en_US_00001/en_US_00001_0003/en_US_00001_0003_source.pkl (extra = b''; compress_type = 8) en_US_00001/en_US_00001_0003/en_US_00001_0003_source.tex (extra = b''; compress_type = 8) en_US_00001/en_US_00001_0003/en_US_00001_0003_user.png (extra = b''; compress_type = 8) Values for compress_type: 8 = ZIP_DEFLATED 0 = ZIP_STORED +'\\')似乎是强制性的。

最重要的是 - 现在ZIP文件被Android应用程序正确接受。