Question

我正在尝试，使用BufferedReader来计算.txt文件中字符串的外观。我正在使用：

File file = new File(path);
try {
  BufferedReader br = new BufferedReader(new FileReader(file));
  String line;
  int appearances = 0;
  while ((line = br.readLine()) != null) {
      if (line.contains("Hello")) {
         appearances++;
      }
  }
} catch (FileNotFoundException e) {
    e.printStackTrace();
} catch (IOException e) {
    e.printStackTrace();
}
System.out.println("Found " + appearances);

但问题是，如果我的.txt文件包含例如字符串"Hello, world\nHello, Hello, world!"和"Hello"，那么外观将变为两个而不是三个，因为它只在一行中搜索的字符串。我怎么能解决这个问题？非常感谢

Answer 1

最简单的解决方案是

while ((line = br.readLine()) != null) 
    appearances += line.split("Hello", -1).length-1;

请注意，如果不是＆＃34; Hello＆＃34;，而是搜索regex-reserved characters的任何内容，您应该在拆分之前转义字符串：

String escaped = Pattern.quote("Hello."); // avoid '.' special meaning in regex
while ((line = br.readLine()) != null) 
    appearances += line.split(escaped, -1).length-1;

Answer 2

这是一个高效而正确的解决方案：

String line;
int count = 0;
while ((line = br.readLine()) != null)    
    int index = -1;
    while((index = line.indexOf("Hello",index+1)) != -1){
        count++;
    }
}
return count;

它遍历该行，并从上一个索引+ 1开始查找下一个索引。

彼得的解决方案的问题在于它是错误的（参见我的评论）。 TheLostMind解决方案的问题在于它通过替换创建了许多新字符串，这是一个不必要的性能缺陷。

Answer 3

正则表达式驱动的版本：

String line;
Pattern p = Pattern.compile(Pattern.quote("Hello")); // quotes in case you need 'Hello.'
int count = 0;
while ((line = br.readLine()) != null)    
    for (Matcher m = p.matcher(line); m.find(); count ++) { }
}
return count;

我现在很想知道这个和gexicide的版本之间的表现 - 当我有结果时会编辑。

编辑：通过在~800k日志文件上运行100次进行基准测试，查找在开始时找到一次的字符串，一次在中间位置，一次在结束时，以及整个数次。结果：

IndexFinder: 1579ms, 2407200hits. // gexicide's code
RegexFinder: 2907ms, 2407200hits. // this code
SplitFinder: 5198ms, 2407200hits. // Peter Lawrey's code, after quoting regexes

结论：对于非正则表达式字符串，repeat-indexOf方法的速度最快。

基本基准代码（来自vanilla Ubuntu 12.04安装的日志文件）：

public static void main(String ... args) throws Exception {
    Finder[] fs = new Finder[] {
        new SplitFinder(), new IndexFinder(), new RegexFinder()};
    File log = new File("/var/log/dpkg.log.1"); // around 800k in size
    Find test = new Find();
    for (int i=0; i<100; i++) {
        for (Finder f : fs) {
            test.test(f, log, "2014"); // start
            test.test(f, log, "gnome"); // mid
            test.test(f, log, "ubuntu1"); // end
            test.test(f, log, ".1"); // multiple; not at start
        }
    }
    test.printResults();
}

Answer 4

 while (line.contains("Hello")) { // search until line has "Hello"
 appearances++;
 line = line.replaceFirst("Hello",""); // replace first occurance of "Hello" with empty String
 }

BufferedReader - 在.txt文件中搜索字符串

4 个答案: