Java - 我的霍夫曼解压缩拒绝解压缩非文本文件(返回空文件)

时间:2017-03-03 11:07:24

标签: java arrays file huffman-code compression

我能够压缩所有类型的文件(.jpg,.mp4等)但是当我尝试解压缩这些非文本文件时,程序只返回一个空的解压缩文件...奇怪的是我是能够解压缩纯文本文件就好了。

当我压缩原始文件时,我将重建树所需的数据和编码位放在同一个文件中。格式如下所示:

<n><value 1><frequency 1>...<value n><frequency n>[the compressed bytes]

其中n是唯一字节的总数(树中AKA的叶子数),value是字节形式的叶子值,频率是每个字节/“字符”的频率(频率是int值,所以它由每个频率4个字节组成。

我的代码中的BitFileReader和BitFileWriter只是BufferedOutStream / InputStream的包装类,具有逐位读/写的附加功能。

我在下面添加了我的整个霍夫曼代码,但主要关注的是底部的compress()和decompress()方法。至少我想知道我的这些方法的逻辑是否正常,如果是这样,是什么导致它在解压缩其他文件类型(不是纯文本文件)时返回空的解压缩文件?

霍夫曼代码:

public class HuffmanCode {


    public static Tree buildTree(int[] charFreqs) {
        PriorityQueue<Tree> trees = new PriorityQueue<Tree>();

        for (int i = 0; i < charFreqs.length; i++){
            if (charFreqs[i] > 0){
                trees.offer(new Leaf(charFreqs[i], i));
            }
        }

        //assert trees.size() > 0;

        while (trees.size() > 1) {
            Tree a = trees.poll();
            Tree b = trees.poll();

            trees.offer(new Node(a, b));
        }
        return trees.poll();
    }

    public static void printStruct(Tree tree) {
        //assert tree != null;
        if (tree instanceof Leaf) {
            Leaf leaf = (Leaf)tree;

            System.out.println(leaf.value + " " + leaf.frequency);

        } else if (tree instanceof Node) {
            Node node = (Node)tree;

            // traverse left
            printStruct(node.left);

            // traverse right
            printStruct(node.right);
        }
    }


    public static void printStruct(Tree tree, StringBuffer prefix) {
        //assert tree != null;
        if (tree instanceof Leaf) {
            Leaf leaf = (Leaf)tree;

            System.out.println(leaf.value + "\t" + leaf.frequency + "\t" + prefix);

        } else if (tree instanceof Node) {
            Node node = (Node)tree;

            // traverse left
            prefix.append('0');
            printStruct(node.left, prefix);
            prefix.deleteCharAt(prefix.length()-1);

            // traverse right
            prefix.append('1');
            printStruct(node.right, prefix);
            prefix.deleteCharAt(prefix.length()-1);
        }
    }

    public static void fillEncodeMap(Tree tree, StringBuffer prefix, TreeMap<Integer, String> treeMap) {
        //assert tree != null;
        if (tree instanceof Leaf) {
            Leaf leaf = (Leaf)tree;

            treeMap.put(leaf.value, prefix.toString());

        } else if (tree instanceof Node) {
            Node node = (Node)tree;

            // traverse left
            prefix.append('0');
            fillEncodeMap(node.left, prefix, treeMap);
            prefix.deleteCharAt(prefix.length()-1);

            // traverse right
            prefix.append('1');
            fillEncodeMap(node.right, prefix, treeMap);
            prefix.deleteCharAt(prefix.length()-1);
        }
    }

    public static void fillDecodeMap(Tree tree, StringBuffer prefix, TreeMap<String, Integer> treeMap) {
        //assert tree != null;
        if (tree instanceof Leaf) {
            Leaf leaf = (Leaf)tree;

            treeMap.put(prefix.toString(), leaf.value);

        } else if (tree instanceof Node) {
            Node node = (Node)tree;

            // traverse left
            prefix.append('0');
            fillDecodeMap(node.left, prefix, treeMap);
            prefix.deleteCharAt(prefix.length()-1);

            // traverse right
            prefix.append('1');
            fillDecodeMap(node.right, prefix, treeMap);
            prefix.deleteCharAt(prefix.length()-1);
        }
    }



    public static void compress(File file){
        try {
            Path path = Paths.get(file.getAbsolutePath());
            byte[] content = Files.readAllBytes(path);
            TreeMap<Integer, String> encodeMap = new TreeMap<Integer, String>();
            File nF = new File(file.getName()+"_comp");
            nF.createNewFile();
            BitFileWriter bfw = new BitFileWriter(nF);

            int[] charFreqs = new int[256];

            // read each byte and record the frequencies
            for (byte b : content){
                charFreqs[b&0xFF]++;
            }

            // build tree
            Tree tree = buildTree(charFreqs);

            // build TreeMap
            fillEncodeMap(tree, new StringBuffer(), encodeMap);

            // Writes tree structure in binary form to nF (new file)
            bfw.writeByte(encodeMap.size());
            for(int i=0; i<charFreqs.length; i++){
                if(charFreqs[i] != 0){
                    ByteBuffer b = ByteBuffer.allocate(4);
                    b.putInt(charFreqs[i]);
                    byte[] result = b.array();

                    bfw.writeByte(i);
                    for(int j=0; j<4;j++){
                        bfw.writeByte(result[j]&0xFF);
                    }
                }
            }

            // Write compressed data
            for(byte b : content){
                String code = encodeMap.get(b&0xFF);
                for(char c : code.toCharArray()){
                    if(c == '1'){
                        bfw.write(1);
                    }
                    else{
                        bfw.write(0);
                    }
                }
            }
            bfw.close();
            System.out.println("Compression successful!");

        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    public static void decompress(File file){
        try {
            BitFileReader bfr = new BitFileReader(file);
            int[] charFreqs = new int[256];
            TreeMap<String, Integer> decodeMap = new TreeMap<String, Integer>();
            File nF = new File(file.getName()+"_decomp");
            nF.createNewFile();
            BitFileWriter bfw = new BitFileWriter(nF);
            DataInputStream data = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));

            int uniqueBytes;
            int counter = 0;
            int byteCount = 0;
            uniqueBytes = data.readUnsignedByte();

            // Read frequency table
            while (counter < uniqueBytes){
              int index = data.readUnsignedByte();
              int freq = data.readInt();
              charFreqs[index] = freq;
              counter++;
            }

            // build tree
            Tree tree = buildTree(charFreqs);

            // build TreeMap
            fillDecodeMap(tree, new StringBuffer(), decodeMap);

            // Skip BitFileReader position to actual compressed code
            bfr.skip(uniqueBytes*5);

            // Get total number of compressed bytes
            for(int i=0; i<charFreqs.length; i++){
                if(charFreqs[i] > 0){
                    byteCount += charFreqs[i];
                }
            }

            // Decompress data and write
            counter = 0;
            StringBuffer code = new StringBuffer();

            while(bfr.hasNextBit() && counter < byteCount){
                code.append(""+bfr.nextBit());

                if(decodeMap.containsKey(code.toString())){
                    bfw.writeByte(decodeMap.get(code.toString()));
                    code.setLength(0);
                    counter++;
                }
            }
            bfw.close();
            bfr.close();
            data.close();

            System.out.println("Decompression successful!");

        } 

        catch (IOException e) {
            e.printStackTrace();
        }
    }

    public static void main(String[] args) {
        File f = new File("test");
        compress(f);
        f = new File("test_comp");
        decompress(f);
    }
}
编辑:我找到了原因,但我不知道如何解决它或为什么会发生。问题是我的解压缩方法中的charFreqs []数组永远不会被填充(它的所有值都是零AKA所有字节根据数组都没有频率)。

1 个答案:

答案 0 :(得分:2)

我解决了!问题是bfw.writeByte(encodeMap.size())方法中的compress()行。它只会向文件写入字节,但encodeMap.size()函数如果已满,则返回值256。 256是一个比一个字节可以容纳的值更高的值(bfw.writeByte()实际上将int作为参数但是它只写入int的8个最低位,基本上只写一个字节可以容纳的位,所以在某种程度上它实际上有一个无符号字节的范围0-255)。

我通过更改两行代码解决了这个问题。我的bfw.writeByte(encodeMap.size())方法中的行compress()已更改为bfw.writeByte(encodeMap.size()-1)uniqueBytes = data.readUnsignedByte()方法中的行decompress()已更改为uniqueBytes = data.readUnsignedByte() + 1