Zlib压缩python

时间:2018-03-22 15:32:36

标签: python compression

为什么压缩字符串的大小更大? zlib不需要压缩 ??

示例:

import zlib
import sys

str1 = "abcdefghijklmnopqrstuvwxyz"
print "size1: ", sys.getsizeof(str1)

print "size2: ", sys.getsizeof(zlib.compress(str1))

输出:

size1:  47
size2:  55

2 个答案:

答案 0 :(得分:2)

你很难压缩像这样的字符串。它相当短,包含26个独特的字符。压缩器通过将字节值分配给常用单词,字符等来工作,因此通过使用所有唯一字符,您将获得较差的性能。

如果数据是随机的,你的表现也会很差。

这是一个使用相同长度的字符串进行压缩的示例。

>>> str2 = 'a'*26
>>> str2
'aaaaaaaaaaaaaaaaaaaaaaaaaa'
>>> sys.getsizeof(str2)
63
>>> sys.getsizeof(zlib.compress(str2))
48

答案 1 :(得分:1)

Grant's answer is fine, but something here needs to be emphasized.

Doesn't the zlib need to compress ??

No! It does not, and cannot always compress. Any operations that losslessly compress and decompress and input must expand some, actually most, inputs, while compressing only some inputs. This is a simple and obvious consequence of counting.

The only thing that is guaranteed by a lossless compressor is that what you get out from decompression is what you put in to compression.

Any useful compression scheme is rigged to take advantage of the specific redundancies expected to be found in the particular kind of data being compressed. Language data, e.g. English, C code, data files, even machine code, which is a sequence of symbols with a specific frequency distribution and oft repeated strings, is compressed using models that are expecting and looking for those redundancies. Such schemes depend on gathering information on the data being compressed in the first, at least, 10's of Kbytes before the compression starts being really effective.

Your example is far too short to have the statistics needed, and has no repetition of any kind, and so will be expanded by any general compressor.