我有以下代码:
String regex = "Some String(.*\r?\n)*.*\\* testing .*\r?\n.*\r?\n.*children.*\r?\n";
Pattern p = Pattern.compile(regex);
System.out.println(p.pattern());
String content = readFile(); // it return file content
Matcher m = p.matcher(content);
因此,当我使用小文件内容进行测试时,它可以正常工作。但是对于大文件内容,我的代码会出现以下错误:
java.lang.StackOverflowError
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4606)
at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3715)
at java.util.regex.Pattern$Ques.match(Pattern.java:4079)
at java.util.regex.Pattern$Curly.match0(Pattern.java:4170)
at java.util.regex.Pattern$Curly.match(Pattern.java:4132)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4556)
at java.util.regex.Pattern$Loop.match(Pattern.java:4683)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4615)
at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3715)
at java.util.regex.Pattern$Ques.match(Pattern.java:4079)
at java.util.regex.Pattern$Curly.match0(Pattern.java:4170)
at java.util.regex.Pattern$Curly.match(Pattern.java:4132)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4556)
at java.util.regex.Pattern$Loop.match(Pattern.java:4683)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4615)
at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3715)
at java.util.regex.Pattern$Ques.match(Pattern.java:4079)
at java.util.regex.Pattern$Curly.match0(Pattern.java:4170)
at java.util.regex.Pattern$Curly.match(Pattern.java:4132)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4556)
at java.util.regex.Pattern$Loop.match(Pattern.java:4683)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4615)
at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3715)
at java.util.regex.Pattern$Ques.match(Pattern.java:4079)
at java.util.regex.Pattern$Curly.match0(Pattern.java:4170)
at java.util.regex.Pattern$Curly.match(Pattern.java:4132)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4556)
at java.util.regex.Pattern$Loop.match(Pattern.java:4683)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4615)
at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3715)
at java.util.regex.Pattern$Ques.match(Pattern.java:4079)
at java.util.regex.Pattern$Curly.match0(Pattern.java:4170)
at java.util.regex.Pattern$Curly.match(Pattern.java:4132)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4556)
at java.util.regex.Pattern$Loop.match(Pattern.java:4683)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4615)
at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3715)
at java.util.regex.Pattern$Ques.match(Pattern.java:4079)
at java.util.regex.Pattern$Curly.match0(Pattern.java:4170)
at java.util.regex.Pattern$Curly.match(Pattern.java:4132)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4556)
at java.util.regex.Pattern$Loop.match(Pattern.java:4683)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4615)
at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3715)
at java.util.regex.Pattern$Ques.match(Pattern.java:4079)
at java.util.regex.Pattern$Curly.match0(Pattern.java:4170)
at java.util.regex.Pattern$Curly.match(Pattern.java:4132)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4556)
at java.util.regex.Pattern$Loop.match(Pattern.java:4683)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4615)
at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3715)
at java.util.regex.Pattern$Ques.match(Pattern.java:4079)
at java.util.regex.Pattern$Curly.match0(Pattern.java:4170)
at java.util.regex.Pattern$Curly.match(Pattern.java:4132)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4556)
at java.util.regex.Pattern$Loop.match(Pattern.java:4683)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4615)
at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3715)
at java.util.regex.Pattern$Ques.match(Pattern.java:4079)
at java.util.regex.Pattern$Curly.match0(Pattern.java:4170)
at java.util.regex.Pattern$Curly.match(Pattern.java:4132)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4556)
...
我认为正则表达式引擎正在产生问题。这可能是因为嵌套量词。那么任何人都可以优化我的正则表达式,以便正则表达式引擎能够轻松处理它吗?
答案 0 :(得分:1)
可能的问题是你的正则表达式匹配太多行。既然你有:
\r?\n
在您的记录中多次,它可以匹配多行。
所以:
"Some String(.*\r?\n)*.*\\* testing .*\r?\n.*\r?\n.*children.*\r?\n"
会匹配
Some String
\ testing
children
Some String
您需要找到一个完成每条记录的表达式,否则正则表达式引擎将能够匹配从第一个记录到最后一个记录,并且可能会破坏。
这个理论显示在堆栈跟踪中,其中有几个堆栈跟踪块与此重复:
at java.util.regex.Pattern$Loop.match(Pattern.java:4683)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4615)
at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3715)
at java.util.regex.Pattern$Ques.match(Pattern.java:4079)
at java.util.regex.Pattern$Curly.match0(Pattern.java:4170)
at java.util.regex.Pattern$Curly.match(Pattern.java:4132)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4556)