Regex Lookahead& Lookbehind with Java

时间:2014-07-18 14:59:40

标签: java regex expression lookahead

我正在尝试使用Java RegEx从HTML页面解析数据,但没有太多运气。数据是动态的,通常包括零到多个空格,制表符,新行的实例。而且,根据命中的数量,字符串I的解析结构可能会改变。以下是最干净格式的示例:

<div class="center">Showing 25 of 2,343,098 (search took 1.245 seconds)</div>

但它也可能如下所示:

<div class="center">Showing 2343098 (search took 1.245 seconds)</div>

<div class="center">

  Showing            125 

 of 2,343,098 




(search took 1.245 seconds)</div>

我试图解析的是2,343,098,但由于页面是HTML,我必须使用&#34;显示&#34;或者&#34;(搜索&#34;在之间进行搜索。空格,标签和新线条让我感到沮丧,我一直试图使用前瞻和后视,但到目前为止还没有运气。这里有一些我试过的模式

String pattern1 = "Showing [0-9]*\\S"; // not useful
String pattern2 = "[[\\d,+\\.?\\d+]*[\\s*\\n]\\(search took"; //fails
String pattern3 = "(/i)(Showing)(.+?)(\\(search took)"; //fails
String pattern4 = "([\\s\\S]*)\\(search took"; //fails
String pattern5 = "(?s)[\\d].*?(?=\\(search took)"; //close...but fails

Pattern pattern = Pattern.compile(pattern5);
Matcher matcher = pattern.matcher(text); // text = the string I'm parsing
while(matcher.find()) {
    System.out.println(matcher.group(0));
}

2 个答案:

答案 0 :(得分:1)

HTML不是常规语言,无法使用正则表达式准确解析。当标记的格式在未来发生变化时,基于正则表达式的解决方案可能会中断,但基于解析器的解决方案将更加准确。

但是,如果这是一次性工作,您可以使用以下正则表达式:

Showing\s+(?:\d+\s+of\s+)?([\d,.]+)\s+\(search

Demo

答案 1 :(得分:0)

示例建议

"Showing\\s+\\d+\\s+(of\\s+[\\d,.]+\\s+)?\\(search"