Question

我正在做一个霍夫曼算法的课程项目。在读取文件并生成霍夫曼代码（1s＆amp; 0s）后，我必须使用按位运算将其导出到新文件。出于某种原因，当我使用按位操作导出时，文件最终会比以前更大。使用1s和0s表示前面的字符，使用按位我必须将每个1和0保存在8位的链中。这是我的代码：

byte currentByte = 0;
for (int i = 0, j = 0; i < binaryString.length(); i++, j++) {
    if (binaryString.charAt(i) == '1') {
        currentByte |= (byte) Math.pow(2, 7 - j);
    }
    if (i != 0 && (i % 8 == 0 || i == binaryString.length() - 1)) {
        output.writeObject(currentByte);
        if (i % 8 == 0) {
             currentByte = 0;
             j = 0;
        }
    }
}

谢谢。

Answer 1

您正在使用ObjectOutputStream，它用于Java对象的可移植序列化。如果你想写单个字节，你应该使用FileOutputStream代替。

Answer 2

问题是你使用的是writeObject方法而不是write方法。

writeObject方法写入有关对象的信息以及写入方法设计为简单地写入单个字节的对象本身。

您还应该使用FileOutputStream代替ObjectOutputStream。

请参阅：ObjectStream.write(byte)

Answer 3

public static void main(String[] args) throws IOException
{
    FileOutputStream output = new FileOutputStream("C:\\temp\\t.dat");
    String inp = "1100110000110011";
    byte[] ar = new byte[1];
    int b = 0;
    int j = 0;
    int i = 0;
    while(i < inp.length())
    {
        if(inp.charAt(i) == '1')
            b |= 1 << (7-j);

        j++;
        i++;
        if(i % 8 == 0)
        {
            //StringBuilder sb = new StringBuilder();
            //sb.append(String.format("%02X ", b));
            //System.out.print(sb.toString());
            ar[0] = (byte)b;
            output.write(ar);
            j = 0;
            b = 0;
        }
    }
    output.close();
}

如果编写更长的序列，可以考虑使用List<byte>然后追加每个字节，而不是单独写每个字节。

Answer 4

为什么你甚至会在第一时间生成一个1和0的字符串？这是一个无用的额外步骤，只能花费额外的时间。

通常的方法是使用一些方便位数的“缓冲区”（比如32，因为那是int），为你编码的每个符号写一个可变位数到该缓冲区，并从缓冲区中排出整个字节。

例如，（未经过测试，但之前我已经完成此操作）

int buffer = 0, bufbits = 0;
for (int i = 0; i < symbols.length(); i++)
{
    int s = symbols[i];
    buffer <<= lengths[s];  // make room for the bits
    bufbits += lengths[s];  // buffer got longer
    buffer |= values[s];    // put in the bits corresponding to the symbol

    while (bufbits >= 8)    // as long as there is at least a byte in the buffer
    {
        bufbits -= 8;       // forget it's there
        stream.write((byte)(buffer >>> bufbits)); // and save it
        // note: bits are not removed from the buffer, just forgotten about
        // so it will "overflow", but that is harmless.
        // you will see weird values in the debugger though
    }
}

不要忘记循环结束时某些内容可能仍在缓冲区中。所以单独写出来。

某些格式要求打包是相反的方式，即缓冲区中前一个符号前面的下一个符号。这是一个简单的改变。

使用32位表示最大符号长度为32 - 7 = 25，这通常大于已经放置在符号长度上的其他边界（通常为15或16）。如果你需要更多，使用long的最大符号长度是57.解码时很长的长度是不方便的（因为使用了表 - 没有人真正通过逐步走树来解码），所以通常他们是没用过。

Answer 5

您需要更改if职位：

public static void main(String[] args) {
    String binaryString = "1111111100000010";
    byte currentByte = 0;
    for (int i = 0, j = 0; i < binaryString.length(); i++, j++) {
        if (i != 0 && i % 8 == 0 || i == binaryString.length() - 1) {
            System.out.println(currentByte); // for debug
            currentByte = 0;
            j = 0;
        }
        if (binaryString.charAt(i) == '1') {
            currentByte |= 1 << 7 - j;
        }
    }
}

二进制字符串的输出：

1
2

请注意，如果您有11111111，则-1类型为byte。

按位运算

5 个答案: