匹配正则表达式时获取java.lang.StackOverflowError

时间:2018-07-18 18:56:41

标签: java regex

我正在使用的模式是

Pattern listPattern = Pattern.compile(
            "\\s*'([^']*('')*)+'\\s*(,\\s*'([^']*('')*)+'\\s*)*"
                    + "|"
                    + "\\s*[0-9\\.\\-]+(,\\s*[0-9\\.-]+)*\\s*",
            Pattern.MULTILINE|Pattern.CASE_INSENSITIVE);

需要这种模式来验证输入是否正确,以便将其添加到sql查询中的in()子句中,并且值类似,

String value="'xyz2006201257200426888282d','xyz2006201300193058314082d'";

在这里我只使用了2个id,但是当这个id(例如xyz2006201257200426888282d)的数量更多(〜> 600)时,我得到了堆栈溢出异常。 有人可以解决正则表达式堆栈的效率低下问题吗?

stacktrace:

Exception in thread "main" java.lang.StackOverflowError
at java.lang.Character.codePointAt(Character.java:4866)
at java.util.regex.Pattern$CharProperty.match(Pattern.java:3775)
at java.util.regex.Pattern$Curly.match0(Pattern.java:4250)
at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
at java.util.regex.Pattern$Loop.match(Pattern.java:4785)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
at java.util.regex.Pattern$GroupCurly.match0(Pattern.java:4485)
at java.util.regex.Pattern$GroupCurly.match(Pattern.java:4405)
at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
at java.util.regex.Pattern$Loop.matchInit(Pattern.java:4801)
at java.util.regex.Pattern$Prolog.match(Pattern.java:4741)
at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3798)
at java.util.regex.Pattern$Curly.match0(Pattern.java:4279)
at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3798)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
at java.util.regex.Pattern$Loop.match(Pattern.java:4785)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
at java.util.regex.Pattern$Curly.match0(Pattern.java:4279)
at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3798)
at java.util.regex.Pattern$Loop.match(Pattern.java:4794)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
at java.util.regex.Pattern$GroupCurly.match0(Pattern.java:4485)
at java.util.regex.Pattern$GroupCurly.match(Pattern.java:4405)
at java.util.regex.Pattern$Curly.match0(Pattern.java:4279)
at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
at java.util.regex.Pattern$Loop.match(Pattern.java:4785)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
at java.util.regex.Pattern$GroupCurly.match0(Pattern.java:4485)
at java.util.regex.Pattern$GroupCurly.match(Pattern.java:4405)
at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
at java.util.regex.Pattern$Loop.matchInit(Pattern.java:4801)
at java.util.regex.Pattern$Prolog.match(Pattern.java:4741)
at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3798)
at java.util.regex.Pattern$Curly.match0(Pattern.java:4279)
at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3798)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
at java.util.regex.Pattern$Loop.match(Pattern.java:4785)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
at java.util.regex.Pattern$Curly.match0(Pattern.java:4279)
at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3798)
at java.util.regex.Pattern$Loop.match(Pattern.java:4794)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
at java.util.regex.Pattern$GroupCurly.match0(Pattern.java:4485)
at java.util.regex.Pattern$GroupCurly.match(Pattern.java:4405)
at java.util.regex.Pattern$Curly.match0(Pattern.java:4279)
at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
at java.util.regex.Pattern$Loop.match(Pattern.java:4785)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
at java.util.regex.Pattern$GroupCurly.match0(Pattern.java:4485)
at java.util.regex.Pattern$GroupCurly.match(Pattern.java:4405)
at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)

1 个答案:

答案 0 :(得分:2)

我认为您的基本问题是此条款([^']*('')*)+
它可能增加了不必要的步骤。

更新:
您可以将其替换为展开循环的版本,该版本将显着地
减少总体步骤。 [^']*(?:''[^']*)*

现在重写正则表达式

"(\\s*'[^']*(?:''[^']*)*'(?:\\s*,\\s*'[^']*(?:''[^']*)*')*\\s*)|(\\s*[0-9.-]+(?:,\\s*[0-9.-]+)*\\s*)"

在此演示中,目标是800 'xyz2006201257200426888282d',以
隔开 逗号。它需要8010个步骤。

https://regex101.com/r/WVrPBb/1

尝试一下,更糟糕的情况是堆栈溢出。

可读版本

    (                             # (1 start)
         \s* 
         '
         [^']* 
         (?: '' [^']* )*
         ' 
         (?:
              \s* , \s* 
              '
              [^']* 
              (?: '' [^']* )*
              ' 
         )*
         \s* 
    )                             # (1 end)
 |  
    (                             # (2 start)
         \s* 
         [0-9.-]+ 
         (?:
              , \s* [0-9.-]+ 
         )*
         \s* 
    )                             # (2 end)