选择windowBits

Question

我有一个gzip文件，我试图通过Python阅读它，如下所示：

import zlib

do = zlib.decompressobj(16+zlib.MAX_WBITS)
fh = open('abc.gz', 'rb')
cdata = fh.read()
fh.close()
data = do.decompress(cdata)

它会抛出此错误：

zlib.error: Error -3 while decompressing: incorrect header check

我怎样才能克服它？

Answer 1

您有此错误：

zlib.error: Error -3 while decompressing: incorrect header check

最有可能的原因是您要检查不存在的标头，例如您的数据遵循RFC 1951（deflate压缩格式）而不是RFC 1950（zlib压缩格式）或RFC 1952（gzip压缩格式）。

选择windowBits

但是zlib可以解压缩所有这些格式：

到（取消）压缩deflate格式，请使用wbits = -zlib.MAX_WBITS
到（取消）压缩zlib格式，请使用wbits = zlib.MAX_WBITS
到（取消）压缩gzip格式，请使用wbits = zlib.MAX_WBITS | 16

请参阅http://www.zlib.net/manual.html#Advanced中的文档（inflateInit2部分）

实例

测试数据：

>>> deflate_compress = zlib.compressobj(9, zlib.DEFLATED, -zlib.MAX_WBITS)
>>> zlib_compress = zlib.compressobj(9, zlib.DEFLATED, zlib.MAX_WBITS)
>>> gzip_compress = zlib.compressobj(9, zlib.DEFLATED, zlib.MAX_WBITS | 16)
>>> 
>>> text = '''test'''
>>> deflate_data = deflate_compress.compress(text) + deflate_compress.flush()
>>> zlib_data = zlib_compress.compress(text) + zlib_compress.flush()
>>> gzip_data = gzip_compress.compress(text) + gzip_compress.flush()
>>>

zlib的明显测试：

>>> zlib.decompress(zlib_data)
'test'

测试deflate：

>>> zlib.decompress(deflate_data)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
zlib.error: Error -3 while decompressing data: incorrect header check
>>> zlib.decompress(deflate_data, -zlib.MAX_WBITS)
'test'

测试gzip：

>>> zlib.decompress(gzip_data)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
zlib.error: Error -3 while decompressing data: incorrect header check
>>> zlib.decompress(gzip_data, zlib.MAX_WBITS|16)
'test'

该数据还与gzip模块兼容：

>>> import gzip
>>> import StringIO
>>> fio = StringIO.StringIO(gzip_data)
>>> f = gzip.GzipFile(fileobj=fio)
>>> f.read()
'test'
>>> f.close()

自动标头检测（zlib或gzip）

将32添加到windowBits将触发标头检测

>>> zlib.decompress(gzip_data, zlib.MAX_WBITS|32)
'test'
>>> zlib.decompress(zlib_data, zlib.MAX_WBITS|32)
'test'

使用`gzip`代替

或者您可以忽略zlib并直接使用gzip模块;但please remember that under the hood，gzip使用zlib。

fh = gzip.open('abc.gz', 'rb')
cdata = fh.read()
fh.close()

Answer 2

更新：dnozay's answer解释了问题，应该是接受的答案。

尝试gzip模块，以下代码直接来自python docs。

import gzip
f = gzip.open('/home/joe/file.txt.gz', 'rb')
file_content = f.read()
f.close()

Answer 3

解压缩gzip压缩数据时，我刚刚解决了“错误的标题检查”问题。

您需要设置-WindowBits =＆gt;在你的inflateInit2调用中使用WANT_GZIP（使用2版本）

是的，这可能非常令人沮丧。通常浅读的文档将Zlib作为Gzip压缩的API，但默认情况下（不使用gz *方法）它不会创建或解压缩Gzip格式。你必须发送这个非常突出的文件。

Answer 4

有趣的是，当我尝试使用Python使用Stack Overflow API时，我遇到了这个错误。

我设法让它使用gzip目录中的GzipFile对象，大致如下：

import gzip

gzip_file = gzip.GzipFile(fileobj=open('abc.gz', 'rb'))

file_contents = gzip_file.read()

Answer 5

我的情况是解压缩存储在Bullhorn数据库中的电子邮件。摘录如下：

import pyodbc
import zlib

cn = pyodbc.connect('connection string')
cursor = cn.cursor()
cursor.execute('SELECT TOP(1) userMessageID, commentsCompressed FROM BULLHORN1.BH_UserMessage WHERE DATALENGTH(commentsCompressed) > 0 ')



 for msg in cursor.fetchall():
    #magic in the second parameter, use negative value for deflate format
    decompressedMessageBody = zlib.decompress(bytes(msg.commentsCompressed), -zlib.MAX_WBITS)

Answer 6

要解压缩内存中不完整的压缩字节，answer by dnozay很有用，但它错过了我认为有必要的incomplete_decompressed_content = zlib.decompressobj(wbits=zlib.MAX_WBITS | 16).decompress(incomplete_gzipped_content)调用：

zlib.MAX_WBITS | 16

请注意，15 | 16是wbits，即31。有关zlib.decompressobj的背景知识，请参见zlib.decompress。

信用：answer by Yann Vernier，它记录了{{1}}的通话。

Answer 7

这不能回答原始问题，但可以帮助到此为止的其他人。

zlib.error: Error -3 while decompressing: incorrect header check也出现在以下示例中：

b64_encoded_bytes = base64.b64encode(zlib.compress(b'abcde'))
encoded_bytes_representation = str(b64_encoded_bytes)  # this the cause
zlib.decompress(base64.b64decode(encoded_bytes_representation))

该示例是对我在某些旧版Django代码中遇到的内容的最小复制，其中Base64编码字节（来自HTTP POST）存储在Django CharField中（而不是{{ 3}}）。

从数据库中读取CharField值时，将在该值上调用str()，而无需显式encoding，如{ {3}}。

str() BinaryField说：

如果既未给出编码也未给出错误，则str（object）返回object。 str （），它是object的“非正式”或可很好打印的字符串表示形式。对于字符串对象，这是字符串本身。如果object没有 str （）方法，则str（）会退回到返回repr（object）的地方。

因此，在该示例中，我们无意中进行了base64解码

"b'eJxLTEpOSQUABcgB8A=='"

代替

b'eJxLTEpOSQUABcgB8A=='。

如果使用显式的zlib，例如，示例中的encoding解压缩将成功。 str(b64_encoded_bytes, 'utf-8')。

注意特定于Django：

特别棘手的问题：仅当从数据库中检索值时，才会出现此问题。例如，请参见下面的测试，该测试通过（在Django 3.0.3中）：

class MyModelTests(TestCase):
    def test_bytes(self):
        my_model = MyModel.objects.create(data=b'abcde')
        self.assertIsInstance(my_model.data, bytes)  # issue does not arise
        my_model = MyModel.objects.first()
        self.assertIsInstance(my_model.data, str)  # issue does arise

MyModel在哪里

class MyModel(models.Model):
    data = models.CharField(max_length=100)

Answer 8

只需添加标题'Accept-Encoding'：'identity'

6. In Function: main
1. In Function: giveme
mobject ctor
4. In Function: func1
2. In Function: func2
mobject overridden-copy-constructor
3. In Function: func2
mobject dtor
5. In Function: func1
mobject dtor
7. In Function: main

https://github.com/requests/requests/issues/3849

zlib.error：解压缩时出错-3：不正确的标头检查

8 个答案:

选择windowBits

实例

自动标头检测（zlib或gzip）

使用`gzip`代替

zlib.error：解压缩时出错-3：不正确的标头检查

8 个答案:

选择windowBits

实例

自动标头检测（zlib或gzip）

使用gzip代替

使用`gzip`代替