我能够压缩所有类型的文件(.jpg,.mp4等)但是当我尝试解压缩这些非文本文件时,程序只返回一个空的解压缩文件...奇怪的是我是能够解压缩纯文本文件就好了。
当我压缩原始文件时,我将重建树所需的数据和编码位放在同一个文件中。格式如下所示:
<n><value 1><frequency 1>...<value n><frequency n>[the compressed bytes]
其中n是唯一字节的总数(树中AKA的叶子数),value是字节形式的叶子值,频率是每个字节/“字符”的频率(频率是int值,所以它由每个频率4个字节组成。
我的代码中的BitFileReader和BitFileWriter只是BufferedOutStream / InputStream的包装类,具有逐位读/写的附加功能。我在下面添加了我的整个霍夫曼代码,但主要关注的是底部的compress()和decompress()方法。至少我想知道我的这些方法的逻辑是否正常,如果是这样,是什么导致它在解压缩其他文件类型(不是纯文本文件)时返回空的解压缩文件?
霍夫曼代码:
public class HuffmanCode {
public static Tree buildTree(int[] charFreqs) {
PriorityQueue<Tree> trees = new PriorityQueue<Tree>();
for (int i = 0; i < charFreqs.length; i++){
if (charFreqs[i] > 0){
trees.offer(new Leaf(charFreqs[i], i));
}
}
//assert trees.size() > 0;
while (trees.size() > 1) {
Tree a = trees.poll();
Tree b = trees.poll();
trees.offer(new Node(a, b));
}
return trees.poll();
}
public static void printStruct(Tree tree) {
//assert tree != null;
if (tree instanceof Leaf) {
Leaf leaf = (Leaf)tree;
System.out.println(leaf.value + " " + leaf.frequency);
} else if (tree instanceof Node) {
Node node = (Node)tree;
// traverse left
printStruct(node.left);
// traverse right
printStruct(node.right);
}
}
public static void printStruct(Tree tree, StringBuffer prefix) {
//assert tree != null;
if (tree instanceof Leaf) {
Leaf leaf = (Leaf)tree;
System.out.println(leaf.value + "\t" + leaf.frequency + "\t" + prefix);
} else if (tree instanceof Node) {
Node node = (Node)tree;
// traverse left
prefix.append('0');
printStruct(node.left, prefix);
prefix.deleteCharAt(prefix.length()-1);
// traverse right
prefix.append('1');
printStruct(node.right, prefix);
prefix.deleteCharAt(prefix.length()-1);
}
}
public static void fillEncodeMap(Tree tree, StringBuffer prefix, TreeMap<Integer, String> treeMap) {
//assert tree != null;
if (tree instanceof Leaf) {
Leaf leaf = (Leaf)tree;
treeMap.put(leaf.value, prefix.toString());
} else if (tree instanceof Node) {
Node node = (Node)tree;
// traverse left
prefix.append('0');
fillEncodeMap(node.left, prefix, treeMap);
prefix.deleteCharAt(prefix.length()-1);
// traverse right
prefix.append('1');
fillEncodeMap(node.right, prefix, treeMap);
prefix.deleteCharAt(prefix.length()-1);
}
}
public static void fillDecodeMap(Tree tree, StringBuffer prefix, TreeMap<String, Integer> treeMap) {
//assert tree != null;
if (tree instanceof Leaf) {
Leaf leaf = (Leaf)tree;
treeMap.put(prefix.toString(), leaf.value);
} else if (tree instanceof Node) {
Node node = (Node)tree;
// traverse left
prefix.append('0');
fillDecodeMap(node.left, prefix, treeMap);
prefix.deleteCharAt(prefix.length()-1);
// traverse right
prefix.append('1');
fillDecodeMap(node.right, prefix, treeMap);
prefix.deleteCharAt(prefix.length()-1);
}
}
public static void compress(File file){
try {
Path path = Paths.get(file.getAbsolutePath());
byte[] content = Files.readAllBytes(path);
TreeMap<Integer, String> encodeMap = new TreeMap<Integer, String>();
File nF = new File(file.getName()+"_comp");
nF.createNewFile();
BitFileWriter bfw = new BitFileWriter(nF);
int[] charFreqs = new int[256];
// read each byte and record the frequencies
for (byte b : content){
charFreqs[b&0xFF]++;
}
// build tree
Tree tree = buildTree(charFreqs);
// build TreeMap
fillEncodeMap(tree, new StringBuffer(), encodeMap);
// Writes tree structure in binary form to nF (new file)
bfw.writeByte(encodeMap.size());
for(int i=0; i<charFreqs.length; i++){
if(charFreqs[i] != 0){
ByteBuffer b = ByteBuffer.allocate(4);
b.putInt(charFreqs[i]);
byte[] result = b.array();
bfw.writeByte(i);
for(int j=0; j<4;j++){
bfw.writeByte(result[j]&0xFF);
}
}
}
// Write compressed data
for(byte b : content){
String code = encodeMap.get(b&0xFF);
for(char c : code.toCharArray()){
if(c == '1'){
bfw.write(1);
}
else{
bfw.write(0);
}
}
}
bfw.close();
System.out.println("Compression successful!");
} catch (IOException e) {
e.printStackTrace();
}
}
public static void decompress(File file){
try {
BitFileReader bfr = new BitFileReader(file);
int[] charFreqs = new int[256];
TreeMap<String, Integer> decodeMap = new TreeMap<String, Integer>();
File nF = new File(file.getName()+"_decomp");
nF.createNewFile();
BitFileWriter bfw = new BitFileWriter(nF);
DataInputStream data = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));
int uniqueBytes;
int counter = 0;
int byteCount = 0;
uniqueBytes = data.readUnsignedByte();
// Read frequency table
while (counter < uniqueBytes){
int index = data.readUnsignedByte();
int freq = data.readInt();
charFreqs[index] = freq;
counter++;
}
// build tree
Tree tree = buildTree(charFreqs);
// build TreeMap
fillDecodeMap(tree, new StringBuffer(), decodeMap);
// Skip BitFileReader position to actual compressed code
bfr.skip(uniqueBytes*5);
// Get total number of compressed bytes
for(int i=0; i<charFreqs.length; i++){
if(charFreqs[i] > 0){
byteCount += charFreqs[i];
}
}
// Decompress data and write
counter = 0;
StringBuffer code = new StringBuffer();
while(bfr.hasNextBit() && counter < byteCount){
code.append(""+bfr.nextBit());
if(decodeMap.containsKey(code.toString())){
bfw.writeByte(decodeMap.get(code.toString()));
code.setLength(0);
counter++;
}
}
bfw.close();
bfr.close();
data.close();
System.out.println("Decompression successful!");
}
catch (IOException e) {
e.printStackTrace();
}
}
public static void main(String[] args) {
File f = new File("test");
compress(f);
f = new File("test_comp");
decompress(f);
}
}
编辑:我找到了原因,但我不知道如何解决它或为什么会发生。问题是我的解压缩方法中的charFreqs []数组永远不会被填充(它的所有值都是零AKA所有字节根据数组都没有频率)。
答案 0 :(得分:2)
我解决了!问题是bfw.writeByte(encodeMap.size())
方法中的compress()
行。它只会向文件写入字节,但encodeMap.size()
函数如果已满,则返回值256。 256是一个比一个字节可以容纳的值更高的值(bfw.writeByte()
实际上将int作为参数但是它只写入int的8个最低位,基本上只写一个字节可以容纳的位,所以在某种程度上它实际上有一个无符号字节的范围0-255)。
我通过更改两行代码解决了这个问题。我的bfw.writeByte(encodeMap.size())
方法中的行compress()
已更改为bfw.writeByte(encodeMap.size()-1)
,uniqueBytes = data.readUnsignedByte()
方法中的行decompress()
已更改为uniqueBytes = data.readUnsignedByte() + 1