Question

我正在创建一个Huffman树来压缩文本文件，但我遇到了一些问题。我正在制作的这个方法应该使用FileInputStream输入文本数据并返回Map个字符和计数。但是，要做到这一点，我需要定义byte[]的大小来存储数据。问题是byte[]数组大小需要恰当的长度，否则Map也会有一些不需要的数据。有没有办法让byte[]大小合适？

这是我的代码：

// provides a count of characters in an input file and place in map
public static Map<Character, Integer> getCounts(FileInputStream input)
        throws IOException {
    Map<Character, Integer> output = new TreeMap<Character, Integer>(); // treemap keeps keys in sorted order (chars alphabetized)
    byte[] fileContent = new byte[100]; // creates a byte[]
    //ArrayList<Byte> test = new ArrayList<Byte>();
    input.read(fileContent);                // reads the input into fileContent
    String test = new String(fileContent);  // contains entire file into this string to process

    // goes through each character of String to put chars as keys and occurrences as keys
    for (int i = 0; i < test.length(); i++) {
        char temp = test.charAt(i);
        if (output.containsKey(temp)) { // seen this character before; increase count

            int count = output.get(temp);
            System.out.println("repeat; char is: " + temp + "count is: " + count);
            output.put(temp, count + 1);
        } else {                        // Haven't seen this character before; create count of 1
            System.out.println("new; char is: " + temp + "count is: 1");
            output.put(temp, 1);
        }
    }
    return output;
}

Answer 1

FileInputStream.read()的返回值是实际读取的字节数，如果是EOF，则为-1。您可以在 for 循环中使用此值代替test.length()。

请注意，read()无法保证读取缓冲区长度的字节数，即使未到达文件末尾，因此通常在循环中使用它：

int bytesRead;

//Read until there is no more bytes to read.
while((bytesRead = input.read(buf))!=-1)
{
    //You have next bytesRead bytes in a buffer here      
}

最后，如果你的字符串是Unicode，这种方法将不起作用，因为read()可以终止中间字符。考虑使用InputStreamReader打包FileInputStream：

Reader fileReader = new InputStreamReader(input, "UTF-8");

int charsRead;
char buf[] = new char[256];

while ((charsRead = fileReader.read(buf)) > 0) {
   //You have charsRead characters in a buffer here
}

FileInputStream和Huffman Tree

1 个答案: