Question

我有近500个文本文件，包含1000万字。我必须索引这些词。从字符逐个字符读取文本文件的最快方法是什么？这是我最初的尝试：

InputStream ist = new FileInputStream(this.path+"/"+doc);
BufferedReader in = new BufferedReader(new InputStreamReader(ist));

String line;

while((line = in.readLine()) != null){


   line = line.toUpperCase(Locale.ENGLISH);
    String word = "";

    for (int j = 0; j <= line.length(); j++) {
         char  c= line.charAt(j);
     // OPERATIONS

}

Answer 1

read()不会在性能上产生相当大的差异。

了解详情：Peter Lawery's comparison of read() and readLine()

现在，回到原来的问题：
输入字符串：hello how are you?
所以你需要索引该行的单词，即：

BufferedReader r = new BufferedReader(new InputStreamReader(inputStream));
String line;
while ((line = r.readLine()) != null) {
   String[] splitString = line.split("\\s+");
   //Do stuff with the array here, i.e. construct the index.
}

注意：模式\\s+会将分隔符放在字符串中，如任何空格，如制表符，空格等。

Answer 2

InputStreamReader的read（）方法可以一次读取一个字符。

您可以将其包装在FileReader或BufferedReader或示例中。

希望这有帮助！

Answer 3

不要读取行，然后通过char重新扫描行char。这样你就可以处理每个角色两次。只需通过BufferedReader.read（）读取字符。

Java - Char读取文本文件Char的最快方法

3 个答案: