我是java的新手,我必须了解一个霍夫曼代码。程序获取任何文件的内容并根据霍夫曼编码方案对其进行编码。我不会在代码中理解这个小部分。 它在主文件中。 一个大小为256的数组。 评论说,为简单起见,我们假设所有字符的代码都小于256。 256为什么简单?如果我增加或减少数组的大小会发生什么? 此外,对于某些尺寸,我得到一个错误的界限。 有人可以解释为什么, 感谢
import java.io.File;
import java.io.FileNotFoundException;
import java.math.BigInteger;
import java.util.Scanner;
public class Main {
public static void main(String[] args) throws FileNotFoundException {
String a = "test-short.txt";
@SuppressWarnings("resource")
final long startTime = System.currentTimeMillis();
String content = new Scanner(new File(a)).useDelimiter("\\Z").next();
HuffmanCode newCode = new HuffmanCode();
// we will assume that all our characters will have
// code less than 256, for simplicity
int[] charFreqs = new int[256];
// read each character and record the frequencies
for (char loop : content.toCharArray()){
charFreqs[loop]++;
}
//Build tree
//Parse the int array of frequencies to HuffmanTree
HuffmanTree tree = newCode.createTree(charFreqs);
// print out results
System.out.println("Char\tFreq\tHUFF CODE");
newCode.printResults(tree, new StringBuffer());
newCode.findHeight(tree);
printRwquiredResults(content, newCode.realcode, newCode.height, newCode.numberOfNode, newCode.printAverageDepth());
final long endTime = System.currentTimeMillis();
System.out.println("Total execution time: " + (endTime - startTime) );
}
public static void printRwquiredResults(String content, String compressedCode, int heightOfTree, int huffTreeTotalNode, float avrTreeDepth){
int textFileLenght = (content.length()*3);
int textFileCompressed = compressedCode.length();
float compressionRatio = ((float) textFileLenght/textFileCompressed);
System.out.println("Uncompressed file size: " + textFileLenght);
System.out.println("Compressed file size: " + textFileCompressed);
System.out.printf("Compression ratio: %.6f%n" , compressionRatio);
System.out.println("Huffman tree height: " + heightOfTree);
System.out.println("Huffman tree number of nodes: " + huffTreeTotalNode);
System.out.printf("Huffman tree average depth: %.6f%n", avrTreeDepth);
}
}
答案 0 :(得分:0)
Java char
(调用toCharArray()
时得到的)的大小为16位,范围(当解释为无符号类型时)from 0 to 65535:
对于char,来自' \ u0000'到' \ uffff'包容性,即从0到65535
此模型基于最初的Unicode规范。
理论上,为了表示所有可能的字符频率,您的数组将需要65536(最后一个索引+ 1)的大小。 但是,大多数charsets(简单地说:哪个代码代表哪个字符,反之亦然)构建在ASCII上,每个字符只使用7位。因此,如果您只使用ASCII字符(例如数字,一些特殊字符,空格和英语letters),您可以放心地假设所有代码都在0到255之间(即8位,所以一个比ASCII多一点)。而且:如果你只使用ASCII表中的字符,你可以将数组大小减小到128,这仍然足以容纳所有必需的频率(here是ASCII中可打印字符的列表):
int[] freq = new int[128]; // enough for ASCII characters
freq['A'] = 10; // okay: ASCII character
freq['Ä'] = 10; // not okay, will throw an ArrayIndexOutOfBoundsException
// as Ä is not an ASCII character
int[] freq = new int[256]; // enough for ASCII characters plus all
// 8-bit wide characters
freq['A'] = 10; // okay: ASCII character
freq['Ä'] = 10; // okay: Ä has the code 196 in UTF-8 which is not ASCII
// but our array is large enough to hold it