这是gzip inflate方法中的错误吗?

时间:2013-07-23 20:42:40

标签: python ios compression gzip inflate

在搜索如何在iOS上对gzip压缩数据进行充气时,会在结果数量中显示以下方法:

- (NSData *)gzipInflate
{
    if ([self length] == 0) return self;

    unsigned full_length = [self length];
    unsigned half_length = [self length] / 2;

    NSMutableData *decompressed = [NSMutableData dataWithLength: full_length + half_length];
    BOOL done = NO;
    int status;

    z_stream strm;
    strm.next_in = (Bytef *)[self bytes];
    strm.avail_in = [self length];
    strm.total_out = 0;
    strm.zalloc = Z_NULL;
    strm.zfree = Z_NULL;

    if (inflateInit2(&strm, (15+32)) != Z_OK) return nil;
    while (!done)
    {
        // Make sure we have enough room and reset the lengths.
        if (strm.total_out >= [decompressed length])
            [decompressed increaseLengthBy: half_length];
        strm.next_out = [decompressed mutableBytes] + strm.total_out;
        strm.avail_out = [decompressed length] - strm.total_out;

        // Inflate another chunk.
        status = inflate (&strm, Z_SYNC_FLUSH);
        if (status == Z_STREAM_END) done = YES;
        else if (status != Z_OK) break;
    }
    if (inflateEnd (&strm) != Z_OK) return nil;

    // Set real length.
    if (done)
    {
        [decompressed setLength: strm.total_out];
        return [NSData dataWithData: decompressed];
    }
    else return nil;
}

但是我遇到了一些数据示例(在使用Python的gzip module的Linux机器上缩小),这种在iOS上运行的方法无法膨胀。这是正在发生的事情:

在while循环的最后一次迭代中,inflate()返回Z_BUF_ERROR并退出循环。但是在循环之后调用的inflateEnd()返回Z_OK。然后代码假定由于inflate()从未返回Z_STREAM_END,因此通胀失败并返回null。

根据这个页面,http://www.zlib.net/zlib_faq.html#faq05 Z_BUF_ERROR不是一个致命的错误,我的测试有限的例子显示,如果inflateEnd()返回Z_OK,数据成功膨胀,即使最后一次调用inflate()没有返回Z_OK。似乎inflateEnd()最终膨胀了最后一块数据。

我对压缩以及gzip的工作方式了解不多,所以我对在不完全理解它的作用的情况下对此代码进行更改犹豫不决。我希望有更多关于这个主题的知识能够揭示上面代码中这个潜在的逻辑缺陷,并建议一种解决方法。

谷歌出现的另一种方法似乎也遇到了同样的问题:https://github.com/nicklockwood/GZIP/blob/master/GZIP/NSData%2BGZIP.m

修改

所以,这是一个错误!现在,我们如何解决它?以下是我的尝试。代码审查,任何人?

- (NSData *)gzipInflate
{
    if ([self length] == 0) return self;

    unsigned full_length = [self length];
    unsigned half_length = [self length] / 2;

    NSMutableData *decompressed = [NSMutableData dataWithLength: full_length + half_length];
    int status;

    z_stream strm;
    strm.next_in = (Bytef *)[self bytes];
    strm.avail_in = [self length];
    strm.total_out = 0;
    strm.zalloc = Z_NULL;
    strm.zfree = Z_NULL;

    if (inflateInit2(&strm, (15+32)) != Z_OK) return nil;

    do
    {
        // Make sure we have enough room and reset the lengths.
        if (strm.total_out >= [decompressed length])
            [decompressed increaseLengthBy: half_length];
        strm.next_out = [decompressed mutableBytes] + strm.total_out;
        strm.avail_out = [decompressed length] - strm.total_out;

        // Inflate another chunk.
        status = inflate (&strm, Z_SYNC_FLUSH);

        switch (status) {
            case Z_NEED_DICT:
                status = Z_DATA_ERROR;     /* and fall through */
            case Z_DATA_ERROR:
            case Z_MEM_ERROR:
            case Z_STREAM_ERROR:
                (void)inflateEnd(&strm);
                return nil;
        }
    } while (status != Z_STREAM_END);

    (void)inflateEnd (&strm);

    // Set real length.
    if (status == Z_STREAM_END)
    {
        [decompressed setLength: strm.total_out];
        return [NSData dataWithData: decompressed];
    }
    else return nil;
}

编辑2:

这是一个示例Xcode项目,它说明了我正在运行的问题。在服务器端发生了deflate,在通过HTTP传输之前,数据是base64和url编码。 我已经在ViewController.m中嵌入了url编码的base64字符串。 url-decode和base64-decode以及你的gzipInflate方法都在NSDataExtension.m中

https://dl.dropboxusercontent.com/u/38893107/gzip/GZIPTEST.zip

这是由python gzip库缩小的二进制文件:

https://dl.dropboxusercontent.com/u/38893107/gzip/binary.zip

这是通过HTTP传输的URL编码的base64字符串: https://dl.dropboxusercontent.com/u/38893107/gzip/urlEncodedBase64.txt

2 个答案:

答案 0 :(得分:7)

是的,这是一个错误。

事实上,如果inflate()没有返回Z_STREAM_END,那么你还没有完成通货膨胀。 inflateEnd()返回Z_OK并不是很有意义 - 只是它被赋予了有效状态并且能够释放内存。

所以inflate()必须最终返回Z_STREAM_END才能宣布成功。但是Z_BUF_ERROR不是放弃的理由。在这种情况下,您只需再次使用更多输入或更多输出空间调用inflate()。然后你会得到Z_STREAM_END

来自zlib.h中的文档:

/* ...
Z_BUF_ERROR if no progress is possible or if there was not enough room in the
output buffer when Z_FINISH is used.  Note that Z_BUF_ERROR is not fatal, and
inflate() can be called again with more input and more output space to
continue decompressing.
... */

更新

由于有错误的代码浮动在那里,下面是实现所需方法的正确代码。此代码处理不完整的gzip流,连接的gzip流和非常大的gzip流。对于非常大的gzip流,unsigned中的z_stream长度在编译为64位可执行文件时不够大。 NSUInteger是64位,而unsigned是32位。在这种情况下,您必须循环输入以将其提供给inflate()

此示例仅在任何错误上返回nil。如果需要更复杂的错误处理,则在每个return nil;之后的注释中会记录错误的性质。

- (NSData *) gzipInflate
{
    z_stream strm;

    // Initialize input
    strm.next_in = (Bytef *)[self bytes];
    NSUInteger left = [self length];        // input left to decompress
    if (left == 0)
        return nil;                         // incomplete gzip stream

    // Create starting space for output (guess double the input size, will grow
    // if needed -- in an extreme case, could end up needing more than 1000
    // times the input size)
    NSUInteger space = left << 1;
    if (space < left)
        space = NSUIntegerMax;
    NSMutableData *decompressed = [NSMutableData dataWithLength: space];
    space = [decompressed length];

    // Initialize output
    strm.next_out = (Bytef *)[decompressed mutableBytes];
    NSUInteger have = 0;                    // output generated so far

    // Set up for gzip decoding
    strm.avail_in = 0;
    strm.zalloc = Z_NULL;
    strm.zfree = Z_NULL;
    strm.opaque = Z_NULL;
    int status = inflateInit2(&strm, (15+16));
    if (status != Z_OK)
        return nil;                         // out of memory

    // Decompress all of self
    do {
        // Allow for concatenated gzip streams (per RFC 1952)
        if (status == Z_STREAM_END)
            (void)inflateReset(&strm);

        // Provide input for inflate
        if (strm.avail_in == 0) {
            strm.avail_in = left > UINT_MAX ? UINT_MAX : (unsigned)left;
            left -= strm.avail_in;
        }

        // Decompress the available input
        do {
            // Allocate more output space if none left
            if (space == have) {
                // Double space, handle overflow
                space <<= 1;
                if (space < have) {
                    space = NSUIntegerMax;
                    if (space == have) {
                        // space was already maxed out!
                        (void)inflateEnd(&strm);
                        return nil;         // output exceeds integer size
                    }
                }

                // Increase space
                [decompressed setLength: space];
                space = [decompressed length];

                // Update output pointer (might have moved)
                strm.next_out = (Bytef *)[decompressed mutableBytes] + have;
            }

            // Provide output space for inflate
            strm.avail_out = space - have > UINT_MAX ? UINT_MAX :
                             (unsigned)(space - have);
            have += strm.avail_out;

            // Inflate and update the decompressed size
            status = inflate (&strm, Z_SYNC_FLUSH);
            have -= strm.avail_out;

            // Bail out if any errors
            if (status != Z_OK && status != Z_BUF_ERROR &&
                status != Z_STREAM_END) {
                (void)inflateEnd(&strm);
                return nil;                 // invalid gzip stream
            }

            // Repeat until all output is generated from provided input (note
            // that even if strm.avail_in is zero, there may still be pending
            // output -- we're not done until the output buffer isn't filled)
        } while (strm.avail_out == 0);

        // Continue until all input consumed
    } while (left || strm.avail_in);

    // Free the memory allocated by inflateInit2()
    (void)inflateEnd(&strm);

    // Verify that the input is a valid gzip stream
    if (status != Z_STREAM_END)
        return nil;                         // incomplete gzip stream

    // Set the actual length and return the decompressed data
    [decompressed setLength: have];
    return decompressed;
}

答案 1 :(得分:2)

是的,看起来像个bug。根据{{​​3}},Z_BUF_ERROR只是表明没有更多输出,除非inflate()提供更多输入,而不是本身就是异常中止膨胀循环的原因。

事实上,链接的示例似乎处理Z_BUF_ERRORZ_OK完全相同。