使用搜索从String中检索子字符串

时间:2011-08-10 09:29:44

标签: java html string pattern-matching

有没有快速的方法在另一个字符串中搜索字符串?

我有这样一个文件:

<br>
Comment EC00: 
<br>
The EC00 is different from EC12 next week. The EC00 much wetter in the very end, which is not seen before.
<br>

<br>

<br>
Comment EC12: 
<br>
The Ec12 of today is reliable. It starts cold, but temp are rising. From Sunday normal temp and wet, except for a strengthening high from SE in the very end.
<br>

我删除了所有<br>,我将搜索“注释EC12:”之类的字符串来检索后面的内容:

The Ec12 of today is reliable. It starts cold, but temp are rising. From Sunday normal temp and wet, except for a strengthening high from SE in the very end.

或者最好留下所有<br>以便我至少知道在哪里停止阅读这些行...

P.S。这些注释可能在文档中出现多次。

修改 我认为这个解决方案可以找到出现的问题,至少是一个好的起点。 这是最后一个版本,它对我很有用,因为我知道HTML中的内容是静态的,什么不是。但对于那些想要做同样事情的人,你可以在simmilar中重写前两个循环作为最后一个的方式(而不是'if'使用while - 沿着文本文件的行)

                      StringTokenizer parser = new StringTokenizer(weatherComments);
                      String commentLine = "";
                        String commentWord = "";

                       while (parser.hasMoreTokens()) {
                            if (parser.nextToken().equals("Comment")) {
                                String commentType = parser.nextToken();
                                if (commentType.equals(forecastZone + ":")) {
                                    parser.nextToken(); //first occured <br>
                                    commentWord = parser.nextToken();
                                    while(!commentWord.equals("<br>")){
                                        commentLine += commentWord + " ";
                                        commentWord = parser.nextToken();
                                    }
                                commentLine += "\n";
                                System.out.println(commentLine);
                                }
                            }
                        }

P.P.S。 在下载大量库以使代码看起来更小或更容易理解之前,请先考虑如何自己解决

3 个答案:

答案 0 :(得分:0)

首先,我会删除空行和&lt; br&gt;我会实现像BNDM这样的算法来搜索或更好地使用像StringSearch这样的库。从网站“Java中的高性能模式匹配算法”http://johannburkard.de/software/stringsearch/

答案 1 :(得分:0)

您可以尝试简单地使用indexOf()

String html = ...;
String search = "Comment EC12:";
int comment = html.indexOf(search);
if (comment != -1) {
  int start = comment + search.length();
  int end = start + ...;
  String after = html.substring(start, end);
  ...
}

问题是找到文本的结尾。因此,不替换<br>并在标记上拆分HTML可能很有用:

String html = ...;
String[] parts = html.split("\\p{Space}*<br>\\p{Space}*")
for (int i = 0; i < parts.length; i += 2) {
  String search = parts[i];
  String after = parts[i + 1];
  System.out.println(search + "\n\t" + after);
}

该示例将打印以下内容:

Comment EC00:
    The EC00 is different from EC12 next week. The EC00 much wetter in the very end, which is not seen before.
Comment EC12:
    The Ec12 of today is reliable. It starts cold, but temp are rising. From Sunday normal temp and wet, except for a strengthening high from SE in the very end.

答案 2 :(得分:0)

根据您想要实现的目标,这可能是一种矫枉过正,但我​​建议您使用有限状态自动机字符串搜索。您可以查看http://en.literateprograms.org/Finite_automaton_string_search_algorithm_%28Java%29处的示例。