Java正则表达式后视组没有明显的最大长度错误

时间:2014-07-21 20:45:09

标签: java regex

我知道java正则表达式不支持不同长度的后台,并且以下内容应该导致错误

(?<=(not exceeding|no((\\w|\\s)*)more than))xxxx

但是当*被固定长度说明符替换时

(?<=(not exceeding|no((\\w|\\s){0,30})more than))xxxx

它仍然失败。这是为什么?

4 个答案:

答案 0 :(得分:10)

Java Lookbehind是臭名昭着的Buggy

所以你认为Java不支持无限外观?

但是下面的模式会编译!

(?<=\d+)\w+

...虽然在Match All中会产生意想不到的结果(参见demo)。

另一方面,你可以成功地使用这个其他无限的外观(我在this question时非常惊讶地发现)

(?<=\\G\\d+,\\d+,\\d+),

分割此字符串:0,123,45,6789,4,5,3,4,6000

它将正确输出(参见online demo):

0,123,45
6789,4,5
3,4,6000

这次结果是你所期望的。

但是,如果你使用(?<=\\G\\d+,\\d+),稍微调整正则表达式以获得对而不是三元组,这次它将不会分裂(参见the demo)。


底线

  

Java lookbehind是臭名昭着的错误。了解这一点,我建议你   不要浪费时间去理解为什么会这样做   没有记录。

前一段时间让我得出这个结论的决定性词语来自Jan Goyvaerts,他是 The Regex Cookbook 的共同作者,以及一位创造了极好的大型正则派大师正则表达式引擎,并需要在阳光下保持大多数正则表达式的调味工具RegexBuddy:

  

Java在其后端实现中存在许多错误。一些(但是   并非所有这些都在Java 6中修复。

答案 1 :(得分:4)

这确实很奇怪。我没有找到解释,但如果您将(\\w|\\s){0,30}更改为[\\w\\s]{0,30}

,问题似乎就会消失
Pattern.compile("(?<=(not exceeding|no([\\w\\s]{0,30})more than))xxxx");
//BTW you don't need ^-----------------------------------------^ these parenthesis
//unless you want to use match from this group later

答案 2 :(得分:1)

  

java regex不支持不同长度的后台

这并非完全正确,Java支持有限的可变长度lookbehinds,允许使用示例(?<=.{0,1000})或类似(?<=ab?)c(?<=abc|defgh)

但如果没有任何限制,Java就不支持它。

因此,对于lookbehind子模式的java正则表达式引擎来说并不明显:

{m,n}量词应用于非固定长度子模式:

(?:abc){0,1} is allowed

(?:ab?)?     is allowed
(?:ab|de)    is allowed
(?:ab|de)?   is allowed

(?:ab?){0,1}   is not allowed
(?:ab|de){1}   is not allowed
(?:ab|de){0,1} is not allowed # in my opinion, it is because of the alternation.
                              # When an alternation is detected, the analysis
                              # stops immediatly

要在这种特殊情况下获取此错误消息,您需要两个标准:

  • 潜在的可变长度子模式(即:包含量词,替换或反向引用)

  • {m,n}类型量词。

所有这些案例对用户来说似乎并不明显,因为它似乎是一种随意的选择。但是,我认为真正的原因是通过正则表达式引擎传输来限制模式的预分析时间。

答案 3 :(得分:0)

以下是一些测试用例(我删除了冗余的parens,如@Pshemo所述)。它只会在lookbehind包含子交替的地方失败。错误是

Look-behind group does not have an obvious maximum length near index 45

&#34;明显&#34;是这里的关键词。

   import  java.util.regex.Pattern;
public class Test  {
   public static final void main(String[] ignored)  {
      test("(?<=not exceeding|no)xxxx");
      test("(?<=not exceeding|NOT EXCEEDING)xxxx");
      test("(?<=not exceeding|x{13})xxxx");
      test("(?<=not exceeding|x{12}x)xxxx");
      test("(?<=not exceeding|(x|y){12}x)xxxx");
      test("(?<=not exceeding|no(\\w|\\s){2,30}more than)xxxx");
      test("(?<=not exceeding|no(\\w|\\s){0,2}more than)xxxx");
      test("(?<=not exceeding|no(\\w|\\s){2}more than)xxxx");
   }
      private static final void test(String regex)  {
         System.out.print("testing \"" + regex + "\"...");
         try  {
            Pattern p = Pattern.compile(regex);
            System.out.println("Success");
         }  catch(Exception x)  {
            System.out.println(x);
         }

      }
}

输出:

testing "(?<=not exceeding|no)xxxx"...Success
testing "(?<=not exceeding|NOT EXCEEDING)xxxx"...Success
testing "(?<=not exceeding|x{13})xxxx"...Success
testing "(?<=not exceeding|x{12}x)xxxx"...Success
testing "(?<=not exceeding|(x|y){12}x)xxxx"...java.util.regex.PatternSyntaxException: Look-behind group does not
 have an obvious maximum length near index 27
(?<=not exceeding|(x|y){12}x)xxxx
                           ^
testing "(?<=not exceeding|no(\w|\s){2,30}more than)xxxx"...java.util.regex.PatternSyntaxException: Look-behind
group does not have an obvious maximum length near index 41
(?<=not exceeding|no(\w|\s){2,30}more than)xxxx
                                         ^
testing "(?<=not exceeding|no(\w|\s){0,2}more than)xxxx"...java.util.regex.PatternSyntaxException: Look-behind g
roup does not have an obvious maximum length near index 40
(?<=not exceeding|no(\w|\s){0,2}more than)xxxx
                                        ^
testing "(?<=not exceeding|no(\w|\s){2}more than)xxxx"...java.util.regex.PatternSyntaxException: Look-behind gro
up does not have an obvious maximum length near index 38
(?<=not exceeding|no(\w|\s){2}more than)xxxx
                                      ^