Question

我在S3中有一个zip文件，我试图下载然后解压缩。

我对这些行进行了编码：

conn = S3Connection('','')
bucket = conn.get_bucket('buck1')
key = bucket.get_key("folder1/foldr2/file1.gz")

f = open('/folder1/folder2/file1.gz', 'w')
key.get_file(f)
f.close()

cmd = 'unzip /folder1/folder2/file1.gz'
system(cmd)

但这会导致跟随错误：

End-of-central-directory signature not found.  Either this file is not
Archive:  /folder1/folder2/file1.gz
a zipfile, or it constitutes one disk of a multi-part archive.  In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of /folder1/folder2/file1.gz or
        /folder1/folder2/file1.gz.zip, and cannot find /folder1/folder2/file1.gz.ZIP, period.

我也试过这段代码进行解压缩，但这也是错误cannot be unzipped as this does not seems a zip file：

zip_ref = zipfile.ZipFile('/folder1/folder2/file1.gz', 'r')
zip_ref.extractall('/folder1/folder2/')
zip_ref.close()

我知道当zip文件出现问题时会出现错误，但我不明白的是我只是从S3拉出文件并尝试解压缩它。如何解决此错误并获得所需结果？

注意： 我也无法在我的机器（linux）上手动解压缩文件。我可以看到下载的文件但无法解压缩并获得错误。如果我从s3手动下载文件然后手动解压缩它然后解压缩而没有任何错误。

Answer 1

AFAIK，您无法使用unzip打开gzip存档。 unzip仅适用于.ZIP文件。对.gz文件使用gunzip命令：

cmd = 'gunzip /folder1/folder2/file1.gz'
system(cmd)

修改

如果文件仍然损坏，您应该确保它实际上是一个gzip文件。请尝试以下方法：

hd /folder1/folder2/file1.gz | head

你应该得到类似的东西：

00000000 1f 8b 08 08 0e 7f fc 50 00 03 63 6f 70 79 5f 63 |.......P..copy_c|

确保00000000之后的最后两个八位字节是1f 8b，这是gzip文件的标题。

Answer 2

虽然该文件很可能不是一个好的.gz文件（并且如上所述，你不能使用unzip来处理.gz gzip文件），但还有另一种方法可以在不使用文件句柄的情况下下载文件。

根据您的代码：

key.set_contents_to_file('/path/to/file.gz')

您还可以查看gzip模块 https://docs.python.org/2/library/gzip.html

使用python解压缩从S3下载的文件时出错

2 个答案: