我正在创建一个Huffman树来压缩文本文件,但我遇到了一些问题。我正在制作的这个方法应该使用FileInputStream
输入文本数据并返回Map
个字符和计数。但是,要做到这一点,我需要定义byte[]
的大小来存储数据。问题是byte[]
数组大小需要恰当的长度,否则Map
也会有一些不需要的数据。有没有办法让byte[]
大小合适?
这是我的代码:
// provides a count of characters in an input file and place in map
public static Map<Character, Integer> getCounts(FileInputStream input)
throws IOException {
Map<Character, Integer> output = new TreeMap<Character, Integer>(); // treemap keeps keys in sorted order (chars alphabetized)
byte[] fileContent = new byte[100]; // creates a byte[]
//ArrayList<Byte> test = new ArrayList<Byte>();
input.read(fileContent); // reads the input into fileContent
String test = new String(fileContent); // contains entire file into this string to process
// goes through each character of String to put chars as keys and occurrences as keys
for (int i = 0; i < test.length(); i++) {
char temp = test.charAt(i);
if (output.containsKey(temp)) { // seen this character before; increase count
int count = output.get(temp);
System.out.println("repeat; char is: " + temp + "count is: " + count);
output.put(temp, count + 1);
} else { // Haven't seen this character before; create count of 1
System.out.println("new; char is: " + temp + "count is: 1");
output.put(temp, 1);
}
}
return output;
}
答案 0 :(得分:0)
FileInputStream.read()
的返回值是实际读取的字节数,如果是EOF,则为-1
。您可以在 for 循环中使用此值代替test.length()
。
请注意,read()
无法保证读取缓冲区长度的字节数,即使未到达文件末尾,因此通常在循环中使用它:
int bytesRead;
//Read until there is no more bytes to read.
while((bytesRead = input.read(buf))!=-1)
{
//You have next bytesRead bytes in a buffer here
}
最后,如果你的字符串是Unicode,这种方法将不起作用,因为read()
可以终止中间字符。考虑使用InputStreamReader
打包FileInputStream
:
Reader fileReader = new InputStreamReader(input, "UTF-8");
int charsRead;
char buf[] = new char[256];
while ((charsRead = fileReader.read(buf)) > 0) {
//You have charsRead characters in a buffer here
}