Question

我正在研究C中的霍夫曼编码/解码项目，并且很好地理解算法应如何存储有关霍夫曼树的信息，在解码过程中重新构建树，以及使用变量解压缩到原始输入文件长码。

当写入我的压缩文件时，我将输出一个包含256个4字节整数的表，其中包含唯一的频率，我知道我还需要找到一种方法来处理EOF - 后来担心这个问题。

我的问题是我应该如何完成必要的逐位操作，将可变长度代码流写入一系列1字节的fwrite迭代。

如果我创建了以下（虚构）代码：

a: 001010101010011
b: 100
c: 11111
d: 0

“abcd”的比特流将是：

001010101010011100111110

我知道我需要使用一些按位操作将此流“切”为可写字节：

00101010|10100111|00111110

根据代码长度创建8个不同案例的第一次尝试并不顺利，我很难过。在写入文件时是否有更简单的方法来处理可变长度代码？

谢谢

Answer 1

这里有一些伪代码可以给你一般的想法：

static byte BitBuffer = 0;
static byte BitsinBuffer = 0;

static void WriteBitCharToOutput(char bitChar);
// buffer one binary digit ('1' or '0')
{
  if (BitsInBuffer > 7)
  {
    stream.write(BitBuffer);
    BitsInBuffer = 0;
    BitBuffer = 0; // just to be tidy
  }

  BitBuffer = (BitBuffer << 1) | (bitChar == '1' ? 1 : 0);
  BitsInBuffer++;
}

static void FlushBitBuffer()
// call after last character has been encoded
// to flush out remaining bits
{
  if (BitsInBuffer > 0)
  do
  {
    WriteBitCharToOutput('0'); // pad with zeroes
  } while (BitsInBuffer != 1);
}

Answer 2

作为另一个答案的替代方案，如果你想一次向缓冲区写几个位，你可以。它可能看起来像这样:(这是伪代码，虽然它看起来很真实）

uint32_t buffer = 0;
int bufbits = 0;
for (int i = 0; i < symbolCount; i++)
{
    int s = symbols[i];
    buffer <<= lengths[s];  // make room for the bits
    bufbits += lengths[s];  // buffer got longer
    buffer |= values[s];    // put in the bits corresponding to the symbol

    while (bufbits >= 8)    // as long as there is at least a byte in the buffer
    {
        bufbits -= 8;       // forget it's there
        writeByte((buffer >> bufbits) & 0xFF); // and save it
    }
}

未显示：显然，当您完成写入时，您必须保留缓冲区中遗留的任何内容。

这假定最大代码长度为25或更小。缓冲区中可以保留的最大位数是7,7 + 25是适合32位整数的最长位。这不是一个不错的限制，通常代码长度限制为15或16，以允许最简单的基于表的解码形式，而无需庞大的表。

可变长度霍夫曼码的比特流 - 如何写入文件？

2 个答案: