以下表达式:
^(#ifdef FEATURE)+?\s*$((\r\n.*?)*^(#endif)+\s*[\/\/]*\s*(end of)*\s*FEATURE)+?$
在运行已编译的.Jar文件时覆盖匹配缓冲区。
匹配字符串可以类似于:
这是垃圾线
#ifdef FEATURE
#endif //功能结束这是垃圾线
#ifdef功能
这是一个应该匹配的垃圾线:HOLasduiqwhei& // FEATURE fjfefj #endif // h
#endif FEATURE
这是垃圾线
因此,粗体字符串应匹配。错误如下:
at java.util.regex.Pattern$GroupHead.match(Unknown Source)
at java.util.regex.Pattern$Loop.match(Unknown Source)
at java.util.regex.Pattern$GroupTail.match(Unknown Source)
at java.util.regex.Pattern$Curly.match1(Unknown Source)
at java.util.regex.Pattern$Curly.match(Unknown Source)
at java.util.regex.Pattern$Slice.match(Unknown Source)
at java.util.regex.Pattern$GroupHead.match(Unknown Source)
at java.util.regex.Pattern$Loop.match(Unknown Source)
at java.util.regex.Pattern$GroupTail.match(Unknown Source)
at java.util.regex.Pattern$Curly.match1(Unknown Source)
at java.util.regex.Pattern$Curly.match(Unknown Source)
at java.util.regex.Pattern$Slice.match(Unknown Source)
at java.util.regex.Pattern$GroupHead.match(Unknown Source)
at java.util.regex.Pattern$Loop.match(Unknown Source)
at java.util.regex.Pattern$GroupTail.match(Unknown Source)
at java.util.regex.Pattern$Curly.match1(Unknown Source)
at java.util.regex.Pattern$Curly.match(Unknown Source)
at java.util.regex.Pattern$Slice.match(Unknown Source)
at java.util.regex.Pattern$GroupHead.match(Unknown Source)
at java.util.regex.Pattern$Loop.match(Unknown Source)
at java.util.regex.Pattern$GroupTail.match(Unknown Source)
at java.util.regex.Pattern$Curly.match1(Unknown Source)
at java.util.regex.Pattern$Curly.match(Unknown Source)
at java.util.regex.Pattern$Slice.match(Unknown Source)
at java.util.regex.Pattern$GroupHead.match(Unknown Source)
at java.util.regex.Pattern$Loop.match(Unknown Source)
at java.util.regex.Pattern$GroupTail.match(Unknown Source)
at java.util.regex.Pattern$Curly.match1(Unknown Source)
at java.util.regex.Pattern$Curly.match(Unknown Source)
at java.util.regex.Pattern$Slice.match(Unknown Source)
任何回溯避免策略/表达的改进都是受欢迎的。我已经尝试过原子组(?>)
,但出于某种原因并没有简化。
代码如下:
public String strip(String text){
ArrayList<String> patterns=new ArrayList<String>();
patterns=readFile("Disabled_Features.txt");
for(int i = 0; i < patterns.size(); ++i)
{
Pattern todoPattern = Pattern.compile("^#ifdef "+patterns.get(i)+"((?:\\r?\\n(?!#endif (?:// end of )?"+patterns.get(i)+"$).*)*)\\r?\\n#endif (?:// end of )?"+patterns.get(i)+"$",Pattern.MULTILINE);
Matcher m = todoPattern.matcher(text);
text = m.replaceAll("");
}
return text;
}
答案 0 :(得分:0)
我已经尝试过@Wiktor编写的代码并且效果很好
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class TestRegex {
public static void main(String[] args) {
String text = "this is a junk line\n" +
"\n" +
"#ifdef FEATURE \n" +
"#endif // end of FEATURE\n" +
"\n" +
"this is a junk line\n" +
"\n" +
"#ifdef FEATURE\n" +
"\n" +
"this is a junk line that should be matched: HOLasduiqwhei & // FEATURE fjfefj #endif // h\n" +
"\n" +
"#endif FEATURE\n" +
"\n" +
"this is a junk line";
// this version does not use Pattern.MULTILINE, this should reduce the backtraking
Matcher matcher2 = Pattern.compile("\\n#ifdef FEATURE((?:\\r?\\n(?!#endif (?:// end of )?FEATURE).*)*)\\r?\\n#endif (?:// end of )?FEATURE").matcher(text);
while (matcher2.find()) {
System.out.println(matcher2.group());
}
}
}
这让我认为你的问题是由于输入文件的大小。
因此,如果您的文件太大,您可以将输入实现为CharSequence
,这样您就可以包装大文本文件。为什么?因为从Matcher
构建Pattern
需要CharSequence
作为参数。
答案 1 :(得分:0)
<强> 更新: 强>
我尝试实施Wiktor的解决方案:
"^#ifdef "+patterns.get(i)+"((?:\\r?\\n(?!#endif (?:// end of )?"+patterns.get(i)+"$).*)*)\\r?\\n#endif (?:// end of )?"+patterns.get(i)+"$"
并且它仅捕获第二个块,但不捕获以下块:
#ifdef功能
垃圾捕获文字
#endif //功能结束
无论如何,当我运行时,罐子仍然会溢出。