Question

我是java的新手，我必须了解一个霍夫曼代码。程序获取任何文件的内容并根据霍夫曼编码方案对其进行编码。我不会在代码中理解这个小部分。它在主文件中。一个大小为256的数组。评论说，为简单起见，我们假设所有字符的代码都小于256。 256为什么简单？如果我增加或减少数组的大小会发生什么？此外，对于某些尺寸，我得到一个错误的界限。有人可以解释为什么，感谢

import java.io.File;
import java.io.FileNotFoundException;
import java.math.BigInteger;
import java.util.Scanner;

public class Main {

    public static void main(String[] args) throws FileNotFoundException {

        String a = "test-short.txt";
        @SuppressWarnings("resource")
        final long startTime = System.currentTimeMillis();
        String content = new Scanner(new File(a)).useDelimiter("\\Z").next();
        HuffmanCode newCode = new HuffmanCode();

        // we will assume that all our characters will have
        // code less than 256, for simplicity
        int[] charFreqs = new int[256];
        // read each character and record the frequencies
        for (char loop : content.toCharArray()){
            charFreqs[loop]++;
        }

        //Build tree
        //Parse the int array of frequencies to HuffmanTree
        HuffmanTree tree = newCode.createTree(charFreqs);

        // print out results
        System.out.println("Char\tFreq\tHUFF CODE");
        newCode.printResults(tree, new StringBuffer());
        newCode.findHeight(tree);
        printRwquiredResults(content, newCode.realcode, newCode.height, newCode.numberOfNode, newCode.printAverageDepth());
        final long endTime = System.currentTimeMillis();
        System.out.println("Total execution time: " + (endTime - startTime) );
    }

    public static void printRwquiredResults(String content, String compressedCode, int heightOfTree, int huffTreeTotalNode, float avrTreeDepth){
        int textFileLenght = (content.length()*3);
        int textFileCompressed = compressedCode.length();
        float compressionRatio = ((float) textFileLenght/textFileCompressed);
        System.out.println("Uncompressed file size: " + textFileLenght);
        System.out.println("Compressed file size: " + textFileCompressed);
        System.out.printf("Compression ratio: %.6f%n" , compressionRatio);
        System.out.println("Huffman tree height: " + heightOfTree);
        System.out.println("Huffman tree number of nodes: " + huffTreeTotalNode);
        System.out.printf("Huffman tree average depth: %.6f%n", avrTreeDepth);

    }

}

Answer 1

Java char（调用toCharArray()时得到的）的大小为16位，范围（当解释为无符号类型时）from 0 to 65535：

对于char，来自＆＃39; \ u0000＆＃39;到＆＃39; \ uffff＆＃39;包容性，即从0到65535

此模型基于最初的Unicode规范。

理论上，为了表示所有可能的字符频率，您的数组将需要65536（最后一个索引+ 1）的大小。但是，大多数charsets（简单地说：哪个代码代表哪个字符，反之亦然）构建在ASCII上，每个字符只使用7位。因此，如果您只使用ASCII字符（例如数字，一些特殊字符，空格和英语letters），您可以放心地假设所有代码都在0到255之间（即8位，所以一个比ASCII多一点）。而且：如果你只使用ASCII表中的字符，你可以将数组大小减小到128，这仍然足以容纳所有必需的频率（here是ASCII中可打印字符的列表）：

int[] freq = new int[128]; // enough for ASCII characters
freq['A'] = 10;            // okay: ASCII character
freq['Ä'] = 10;            // not okay, will throw an ArrayIndexOutOfBoundsException
                           // as Ä is not an ASCII character

int[] freq = new int[256]; // enough for ASCII characters plus all
                           // 8-bit wide characters
freq['A'] = 10;            // okay: ASCII character
freq['Ä'] = 10;            // okay: Ä has the code 196 in UTF-8 which is not ASCII
                           // but our array is large enough to hold it

理解java代码

1 个答案: