我正在做家庭作业的霍夫曼压缩器,我设法为所有的char构建了霍夫曼树和0和1的代码,但输出文件比原始文件大。 这里有一个像我这样的问题 Unable to compress file during Huffman Encoding in Java 但我没有得到它。 我的代码:
this.HuffmanTreeBulid();////create the Huffman tree
HuffmanNode root =tree;
this.codeGenerator(root, codes);//create the hushmap
try
{
FileOutputStream out2 = new FileOutputStream(fileOut);//for the new file
FileInputStream in = new FileInputStream(fileInput);//for reading again the orignal file
FileWriter out = new FileWriter(fileOut);
//String code;
char currentchar;
int currentByte;//int for going on all the bytes from the file
if(!fileOut.exists())//if new file exits then replace it if not create it
fileOut.createNewFile();
else
{
fileOut.delete();
fileOut.createNewFile();
}
while((currentByte = in.read())!=-1)
{
int currentint =currentByte& 0xff;//"& 0xff" is for unsigned int
currentchar=(char)currentint;
byte[] c=(huffmanCodes.get(currentchar)).getBytes();
//out.write(huffmanCodes.get(code2));
//out.write(huffmanCodes.get(currentchar));//for FileWriter
out2.write(c);
}
in.close();
out.close();
out2.close();
}
catch (IOException e)
{
e.printStackTrace();
}
updete 1: 我理解这个问题,所以我做了这个
int bitIndex = 0;
for (int i=0;i<codes.length();i++)
{
if(codes.charAt(i)=='1')
buffer.set(bitIndex++);
else
buffer.clear(bitIndex++);
}
仍在努力工作:(
updete 2:我这样做是为了从字符串中获取字节
byte[] bytes = new BigInteger(binaryString, 2).toByteArray();
for (byte b : bytes)
{
out2.write(b);
}
仍然无法工作,但它的关闭我可以到现在为止 也许这个字节很好,但我写的方式错了?
答案 0 :(得分:2)
问题如下:
byte[] c=(huffmanCodes.get(currentchar)).getBytes();
您尝试将编码的字符串设置为裸位和字节。但实际上,getBytes()
只返回平台标准中编码的bytesequence。因此,您可能获得字符“1”的UTF-8字节编码和字符“0”的UTF-8字节编码。
您必须将String解析为一个字节。你可以在这里看到如何做到这一点:
java: convert binary string to int
或在这里: How to convert binary string to a byte?
您可以在此处阅读有关getBytes方法的更多信息: https://beginnersbook.com/2013/12/java-string-getbytes-method-example/
正如@ 9000所说,你没有比特流。
使用压缩器比特流可能比使用完整字节更合适。所以解析一个完整的字节不会压缩你的字符串,因为char仍然是char的大小。
你可以做的是连接生成的二进制字符串,然后在最后将字符串解析为字节。请注意尾随零。
答案 1 :(得分:1)
我建议添加如下内容:
class BitstreamPacker {
private int bitPos; // Actual values 0..7; where to add the next bit.
private ArrayList<Byte> data;
public addBit(bool bit) {
// Add the bit to the last byte of data; allocate more if does not fit.
// Adjusts bitPos as it goes.
}
public void writeBytes(ByteOutputStream output) {
// Writes the number of bytes, then the last bit pos, then the bytes.
}
}
类似地,
class BitstreamUnpacker {
private byte[] data; // Or ArrayList if you wish.
private int currentBytePos;
private int currentBitPos; // Could be enough to track the global bit position.
public static BitstreamUnpacker fromByteStream(ByteInputStream input) {
// A factory method; reads the stream and creates an instance.
// Uses the byte count to allocate the right amount of bytes;
// uses the bit count to limit the last byte to the actual number of bits.
return ...;
}
public Bool getNextBit() {
// Reads bits sequentially from the internal data.
// Returns null when the end of data is reached.
// Or feel free to implement an iterator / iterable.
}
}
请注意,位流可能在字节的中间结束,因此需要在最后一个字节中存储位数。
为了帮助您更好地理解这个想法,这里有一些Python代码(因为Python很容易以交互方式玩):
class BitstreamPacker(object):
def __init__(self):
self.data = [] # A list of bytes.
self.bit_offset = 0 # 0..7.
def add_bit(self, bit):
if self.bit_offset == 0: # We must begin a new byte.
self.data.append(0) # Append a new byte.
# We use addition because we know that the bit we're affecting is 0.
# [-1] means last element.
self.data[-1] += (bit << self.bit_offset)
self.bit_offset += 1
if self.bit_offset > 7: # We've exceeded one byte.
self.bit_offset = 0 # Shift the offset to the beginning of a byte.
def get_bytes(self):
# Just returning the data instead of writing, to simplify interactive use.
return (len(self.data), self.bit_offset, self.data)
如何使用Python REPL?
>>> bp = BitstreamPacker()
>>> bp.add_bit(1)
>>> bp.add_bit(1)
>>> bp.get_bytes()
(1, 2, [3]) # One byte, two bits in it are used.
>>> bp.add_bit(0)
>>> bp.add_bit(0)
>>> bp.add_bit(0)
>>> bp.add_bit(1)
>>> bp.add_bit(1)
>>> bp.add_bit(1)
>>> bp.get_bytes()
(1, 0, [227]) # Whole 8 bits of one byte were used.
>>> bp.add_bit(1)
>>> bp.get_bytes()
(2, 1, [227, 1]) # Two bytes used: one full, and one bit in the next.
>>> assert 0b11100011 == 227 # The binary we sent matches.
>>> _
我希望这会有所帮助。