理解java代码

时间:2016-12-10 10:55:19

标签: java

我是java的新手,我必须了解一个霍夫曼代码。程序获取任何文件的内容并根据霍夫曼编码方案对其进行编码。我不会在代码中理解这个小部分。 它在主文件中。 一个大小为256的数组。 评论说,为简单起见,我们假设所有字符的代码都小于256。 256为什么简单?如果我增加或减少数组的大小会发生什么? 此外,对于某些尺寸,我得到一个错误的界限。 有人可以解释为什么, 感谢

import java.io.File;
import java.io.FileNotFoundException;
import java.math.BigInteger;
import java.util.Scanner;

public class Main {

    public static void main(String[] args) throws FileNotFoundException {

        String a = "test-short.txt";
        @SuppressWarnings("resource")
        final long startTime = System.currentTimeMillis();
        String content = new Scanner(new File(a)).useDelimiter("\\Z").next();
        HuffmanCode newCode = new HuffmanCode();

        // we will assume that all our characters will have
        // code less than 256, for simplicity
        int[] charFreqs = new int[256];
        // read each character and record the frequencies
        for (char loop : content.toCharArray()){
            charFreqs[loop]++;
        }

        //Build tree
        //Parse the int array of frequencies to HuffmanTree
        HuffmanTree tree = newCode.createTree(charFreqs);

        // print out results
        System.out.println("Char\tFreq\tHUFF CODE");
        newCode.printResults(tree, new StringBuffer());
        newCode.findHeight(tree);
        printRwquiredResults(content, newCode.realcode, newCode.height, newCode.numberOfNode, newCode.printAverageDepth());
        final long endTime = System.currentTimeMillis();
        System.out.println("Total execution time: " + (endTime - startTime) );
    }

    public static void printRwquiredResults(String content, String compressedCode, int heightOfTree, int huffTreeTotalNode, float avrTreeDepth){
        int textFileLenght = (content.length()*3);
        int textFileCompressed = compressedCode.length();
        float compressionRatio = ((float) textFileLenght/textFileCompressed);
        System.out.println("Uncompressed file size: " + textFileLenght);
        System.out.println("Compressed file size: " + textFileCompressed);
        System.out.printf("Compression ratio: %.6f%n" , compressionRatio);
        System.out.println("Huffman tree height: " + heightOfTree);
        System.out.println("Huffman tree number of nodes: " + huffTreeTotalNode);
        System.out.printf("Huffman tree average depth: %.6f%n", avrTreeDepth);

    }

}

1 个答案:

答案 0 :(得分:0)

Java char(调用toCharArray()时得到的)的大小为16位,范围(当解释为无符号类型时)from 0 to 65535

  

对于char,来自' \ u0000'到' \ uffff'包容性,即从0到65535

此模型基于最初的Unicode规范。

理论上,为了表示所有可能的字符频率,您的数组将需要65536(最后一个索引+ 1)的大小。 但是,大多数charsets(简单地说:哪个代码代表哪个字符,反之亦然)构建在ASCII上,每个字符只使用7位。因此,如果您只使用ASCII字符(例如数字,一些特殊字符,空格和英语letters),您可以放心地假设所有代码都在0到255之间(即8位,所以一个比ASCII多一点)。而且:如果你只使用ASCII表中的字符,你可以将数组大小减小到128,这仍然足以容纳所有必需的频率(here是ASCII中可打印字符的列表):

int[] freq = new int[128]; // enough for ASCII characters
freq['A'] = 10;            // okay: ASCII character
freq['Ä'] = 10;            // not okay, will throw an ArrayIndexOutOfBoundsException
                           // as Ä is not an ASCII character

int[] freq = new int[256]; // enough for ASCII characters plus all
                           // 8-bit wide characters
freq['A'] = 10;            // okay: ASCII character
freq['Ä'] = 10;            // okay: Ä has the code 196 in UTF-8 which is not ASCII
                           // but our array is large enough to hold it