Question

我正在为一项必须实施LZW压缩/解压缩的作业编写程序。我正在使用以下算法：

-compression

w = NIL;
   while ( read a character k )
       {
         if wk exists in the dictionary
         w = wk;
         else
         add wk to the dictionary;
         output the code for w;
         w = k;
       }

-decompression

read a character k;
   output k;
   w = k;
   while ( read a character k )    
  /* k could be a character or a code. */
        {
         entry = dictionary entry for k;
         output entry;
         add w + entry[0] to dictionary;
         w = entry;
        }

对于压缩阶段，我只是输出表示索引的int 字典条目，起始字典也包含ascii字符（0 - 255）。但是当我进入减压阶段时，我得到了这个错误例如，如果我压缩只包含“booop”的文本文件它将通过这些步骤来生成输出文件：

w       k       Dictionary          Output

-       b       -                   -
b       o       bo (256)            98 (b)
o       o       oo (257)            111 (o)
o       o       -                   -
oo      p       oop (258)           257 (oo)
p       -       -                   112 (p)

output.txt的： 98 111 257 112

然后我来解压缩文件

w       k          entry        output       Dictionary
        98 (b)                  b   
b       111 (o)    o            o             bo (256)
o       257 (error)

257（oo）尚未添加。任何人都可以看到我在哪里出错了因为我难倒。算法错了吗？

Answer 1

您的压缩部分是正确且完整的，但解压缩部分不完整。您只包括代码在字典中的情况。由于解压缩过程总是压缩过程的一步，因此解码器有可能找到不在字典中的代码。但由于它只落后一步，它可以确定编码过程接下来会添加什么并正确输出解码后的字符串，然后将其添加到字典中。要继续这样的解压缩过程：

-decompression

read a character k;
   output k;
   w = k;
   while ( read a character k )    
  /* k could be a character or a code. */
        {
         if k exists in the dictionary
         entry = dictionary entry for k;
         output entry;
         add w + entry[0] to dictionary;
         w = entry;
         else
         output entry = w + firstCharacterOf(w);
         add entry to dictionary;
         w = entry;
        }

然后当您解压缩文件并查看257时，您会发现它不在字典中。但是你知道前面的条目是'o'，它的第一个字符也是'o'，把它们放在一起，你得到“oo”。现在输出oo并将其添加到字典中。接下来你得到代码112并确定你知道它是p。 DONE！

w       k          entry        output       Dictionary
        98 (b)                  b   
b       111 (o)    o            o             bo (256)
o       257 (oo)                oo            oo(257)
oo      112(p)                  p

请参阅Steve Blackstock的this解释，了解更多信息。 better page，其中包含"icafe" Java图像库GIF编码器和解码器所基于的实际解码器和编码器实现的流程图。

Answer 2

来自http://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Welch你是否属于这种情况？

如果解码器收到的字典中尚未包含的代码Z，会发生什么？由于解码器始终只是编码器后面的一个代码，因此只有当编码器刚刚生成它时，Z才能在编码器的字典中，当发出前一个代码X时，χ。因此，Z编码一些ω，即χ+α，并且解码器可以如下确定未知字符：

1) The decoder sees X and then Z.
2) It knows X codes the sequence χ and Z codes some unknown sequence ω.
3) It knows the encoder just added Z to code χ + some unknown character,
4) and it knows that the unknown character is the first letter z of ω.
5) But the first letter of ω (= χ + ?) must then also be the first letter of χ.
6) So ω must be χ + x, where x is the first letter of χ.
7) So the decoder figures out what Z codes even though it's not in the table,
8) and upon receiving Z, the decoder decodes it as χ + x, and adds χ + x to the table as the value of Z.

只要编码器遇到cScSc格式的输入，就会出现这种情况，其中c是单个字符，S是字符串，cS已经在字典中，但cSc不在字典中。编码器发出cS代码，将cSc的新代码放入字典中。接下来它在输入中看到cSc（从cScSc的第二个c开始）并发出它刚刚插入的新代码。上面的论点表明，只要解码器收到不在其字典中的代码，情况就必须如此。

尽管输入形式cScSc可能看起来不太可能，但是当输入流的特征是显着重复时，这种模式相当普遍。特别是，单个字符的长字符串（在LZW经常用于编码的图像种类中很常见）会重复生成这种模式。

对于这个特定情况，维基百科的东西适合，你有X +？其中X是（o），Z是未知的，所以第一个字母是X给（oo）添加（oo）到表257.我只是继续我在维基百科上读到的，让我们知道这是怎么回事如果那不是解决方案。

LZW减压算法

2 个答案: