我尝试了很多不同的搜索和论坛,我去教授办公时间。但他跟我说了5分钟,并建议不要做这个额外的信用分配。
Huffman algorithms pretty straight forward. But decoding is a bit difficult.
public class DecodeMain {
public static void main(String[] args) throws IOException {
FileReader in = null;
String codesFileName = "codes.txt";
Map<String, Character> bin_string_map = new HashMap<String, Character>();
try {
in = new FileReader(codesFileName);
int c;
StringBuilder message = new StringBuilder();
// read characters from the file into a string
while ((c = in.read()) != -1) {
message.append((char) c);
}
in.close();
// split string by comma + space
String[] splitted = message.toString().split(", ");
// for all except 1st and lastS
for (int i = 1; i < splitted.length - 1; i++) {
// key substring after '=' and value first char
bin_string_map.put(splitted[i].substring(2),
splitted[i].charAt(0));
}
bin_string_map.put(splitted[0].substring(3), splitted[0].charAt(1));
bin_string_map.put(splitted[splitted.length - 1].substring(2),
splitted[0].charAt(0));
} catch (Exception e) {
System.err.println("Error: " + e.getMessage());
}
Path path = Paths.get("compressed.txt");
byte[] data = Files.readAllBytes(path);
StringBuilder sb = new StringBuilder();
// for (int i = 0; i < data.length; i++) {
// sb.append(Integer.toBinaryString(data[i]));
// }
System.out.println(Integer.toBinaryString(data[0]));
System.out.println(bin_string_map.get("1011101"));
// T=101100101
// h=0011
// e=000
// byte[0] 1011101
// byte[1] 11111111111111111111111110110011
// byte[2] 110111
// System.out.println(sb.toString());
// StringBuilder text = new StringBuilder();
// int j = 0;
// for (int i = 0; i < sb.toString().length(); i++) {
// String key = sb.toString().substring(j, i);
// if (bin_string_map.containsKey(key)) {
// text.append(bin_string_map.get(key));
// j = i;
// }
// }
// System.out.println(text.toString());
}
}
问题是,当我在这里得到我的compressed.txt文件时,前面是3个字节:
]³7
但是我的book.txt以“The”开头,对应不同的位字符串:
// T=101100101
// h=0011
// e=000
but bytes are:
// byte[0] 1011101
// byte[1] 11111111111111111111111110110011
// byte[2] 110111