我试图弄清楚如何证明Lempel ZIV 77压缩算法真正给出了最佳压缩效果。
我找到了以下信息:
So how well does the Lempel-Ziv algorithm work? In these notes, we’ll
calculate two quantities. First, how well it works in the worst case, and
second, how well it works in the random case where each letter of the message
is chosen uniformly and independently from a probability distribution, where
the ith letter appears with probability pi
. In both cases, the compression
is asymptotically optimal. That is, in the worst case, the length of the
encoded string of bits is n + o(n). Since there is no way to compress all
length-n strings to fewer than n bits, this can be counted as asymptotically
optimal. In the second case, the source is compressed to length
α
H(p1, p2, . . . , pα)n + o(n) = n∑(-pi log2 pi) + O(n)
i=1
which is to first order the Shannon bound.
这是什么意思? 为什么没有办法将alllength-n字符串压缩到少于n位?
谢谢大家。
答案 0 :(得分:1)
有两个长度为n的不同随机字符串。为了解压缩它们,压缩算法必须将它们全部压缩到不同的压缩版本:如果两个不同的n长串压缩到相同的序列,则无法分辨哪个被解压缩到。如果所有都被压缩成长度为k <1的串。 n将只有2 ^ k <2 ^ n个不同的压缩字符串,因此必须存在两个不同字符串压缩到相同值的情况。
请注意,对于所有情况,没有实际保证的最优方案。如果我知道长的明显随机序列是带密钥的流密码的输出我也知道我可以通过仅给出密码和密钥的设计来描述它,但是压缩可能需要很长时间通过算法得出长期明显随机序列可以通过这种方式进行大量压缩。