Java Regex用于过滤注释无法按预期工作的行

时间:2016-08-31 19:01:40

标签: java regex parsing comments regex-lookarounds

我将这个简化版本的代码放在一起来演示问题:

.tooltip

我希望我的输出为:

.tooltip

但我明白了:

public static void main(String []args){
    String content="1 [thing i want]\n" +
    "2 [thing i dont want]\n" +
    "3 [thing i dont want] [thing i want]\n" +
    "4 // [thing i want]\n" +
    "5 [thing i want]  // [thing i want]\n";

    String BASE_REGEX = "(?!//)\\[%s\\]";
    Pattern myRegex = Pattern.compile(String.format(BASE_REGEX, "thing i want"));
    Matcher m= myRegex.matcher(content);
    System.out.println("match? "+m);
    String newContent = m.replaceAll("best thing ever");
    System.out.println("regex "+myRegex);
    System.out.println("content:\n"+content);
    System.out.println("new content:\n"+newContent);
 }

如何修复正则表达式?

未修改的字符串:

new content:
1 best thing ever
2 [thing i dont want]
3 [thing i dont want] best thing ever
4 // [thing i want]
5 best thing ever  // [thing i want]

1 个答案:

答案 0 :(得分:1)

没有真正简单的方法可以测试某些内容是否在内联注释中。 Java正则表达式引擎能够向后看但具有有限的“距离”(换句话说,它允许有限的可变长度的后视图)并且我不确定使用此功能构建模式是非常有效的。

您可以做的是从每行的开头检查所有内容:

(?m)((?:\G|^)[^\[/\n]*+(?:\[(?!thing i want\])[^\[/\n]*|/(?!/)[^\[/\n]*)*+)\[thing i want\]

(转义每个反斜杠以在Java中编写模式字符串)

替换:

$1best thing ever

说明:目标是从目标之前的行开始或同一行中的前一个目标捕获所有目标。通过这种方式,您可以精确地描述目标发生之前允许或不允许的内容(所有不是目标或两个连续斜线)

(?m) # switch the multi-line mode on: the ^ means "start of the line"
(    # open the capture group $1
    (?:    # non-capturing group: two possible starts
        \G # contiguous to a previous match (on the same line) 
      |    # OR
        ^  # at the start of the line
    )

    [^\[/\n]*+ # all that is not: an opening bracket, a slash or a newline
              # * stands for "0 or more times" and the + after forbids
              # to backtrack in this part if the pattern fails later
              # "*+" is called a "possessive quantifier"
    (?:
        \[                   # literal [
         (?!thing i want\])  # not followed by "thing i want]"
         [^\[/\n]*            
      |                      # OR
         /                   # literal /
         (?!/)               # not followed by an other /
         [^\[/\n]*
     )*+  # zero or more times
) # close the capture group $1
\[thing i want\] # the target