Question

好的，所以我有一个包含gz文件的zip文件（unix gzip）。

这就是我的工作 -

def parseSTS(file):
    import zipfile, re, io, gzip
    with zipfile.ZipFile(file, 'r') as zfile:
        for name in zfile.namelist():
            if re.search(r'\.gz$', name) != None:
                zfiledata = zfile.open(name)
                print("start for file ", name)
                with gzip.open(zfiledata,'r') as gzfile:
                    print("done opening")
                    filecontent = gzfile.read()
                    print("done reading")
                    print(filecontent)

这给出了以下结果 -

>>> 
start for file  XXXXXX.gz
done opening
done reading

然后永远保持这种状态直到它崩溃......

我可以使用filecontent做什么？

编辑：这不是重复，因为我的gzip文件是压缩文件，我试图避免将该zip文件解压缩到磁盘。它按照How to read from a zip file within zip file in Python?在zip文件中使用zip文件。

Answer 1

我创建了一个zip文件，其中包含我从网上抓取的gzip'ed PDF文件。

我运行了这段代码（有两处小改动）：

1）修正了def语句下的所有内容（我在你的问题中也进行了更正，因为我确信它在你的结尾是正确的，或者它不会解决你的问题）。

2）我改变了：

            zfiledata = zfile.open(name)
            print("start for file ", name)
            with gzip.open(zfiledata,'r') as gzfile:
                print("done opening")
                filecontent = gzfile.read()
                print("done reading")
                print(filecontent)

为：

            print("start for file ", name)
            with gzip.open(name,'rb') as gzfile:
                print("done opening")
                filecontent = gzfile.read()
                print("done reading")
                print(filecontent)

因为您将文件对象传递给gzip.open而不是字符串。我不知道你的代码是如何在没有这种改变的情况下执行的，但是在我修复它之前它一直在崩溃。

编辑：从James R的答案添加指向GZIP文档的链接 -

另外，请参阅此处以获取更多文档：

http://docs.python.org/2/library/gzip.html#examples-of-usage

结束编辑

现在，由于我的gzip文件很小，我观察到的行为是在打印done reading后暂停约3秒，然后输出filecontent中的内容。

我建议在打印“完成阅读”后添加以下调试行 - print len(filecontent)。如果此数字非常非常大，请考虑不要一次打印整个文件内容。

我还建议您阅读本文，以便更深入地了解我对您的问题的期望：Why is printing to stdout so slow? Can it be sped up?

编辑2 - 如果您的系统没有处理zip文件上的文件io，导致上述文件中没有此类文件错误，则可以选择

def parseSTS(afile):
    import zipfile
    import zlib
    import gzip
    import io
    with zipfile.ZipFile(afile, 'r') as archive:
        for name in archive.namelist():
            if name.endswith('.gz'):
                    bfn = archive.read(name)
                    bfi = io.BytesIO(bfn)
                    g = gzip.GzipFile(fileobj=bfi,mode='rb')
                    qqq = g.read()
                    print qqq

parseSTS('t.zip')

Answer 2

很可能你的问题在于：

       if name.endswith(".gz"): #as goncalopp said in the comments, use endswith
            #zfiledata = zfile.open(name) #don't do this
            #print("start for file ", name)
            with gzip.open(name,'rb') as gzfile: #gz compressed files should be read in binary and gzip opens the files directly
                #print("done opening") #trust in your program, luke
                filecontent = gzfile.read()
                #print("done reading")
                print(filecontent)

请参阅此处以获取更多文档：

http://docs.python.org/2/library/gzip.html#examples-of-usage

如何打印压缩的gzip文件的内容

2 个答案: