我想从Java源代码文件中删除所有类型的注释语句。 示例:
String str1 = "SUM 10" /*This is a Comments */ ;
String str2 = "SUM 10"; //This is a Comments"
String str3 = "http://google.com"; /*This is a Comments*/
String str4 = "('file:///xghsghsh.html/')"; //Comments
String str5 = "{\"temperature\": {\"type\"}}"; //comments
预期输出:
String str1 = "SUM 10";
String str2 = "SUM 10";
String str3 = "http://google.com";
String str4 = "('file:///xghsghsh.html/')";
String str5 = "{\"temperature\": {\"type\"}}";
我正在使用以下正则表达式来实现:
System.out.println(str1.replaceAll("[^:]//.*|/\\\\*((?!=*/)(?s:.))+\\\\*/", ""));
这给我str4和str5错误的结果。 请帮助我解决此问题。
使用Andreas解决方案:
final String regex = "//.*|/\\*(?s:.*?)\\*/|(\"(?:(?<!\\\\)(?:\\\\\\\\)*\\\\\"|[^\\r\\n\"])*\")";
final String string = " String str1 = \"SUM 10\" /*This is a Comments */ ; \n"
+ " String str2 = \"SUM 10\"; //This is a Comments\" \n"
+ " String str3 = \"http://google.com\"; /*This is a Comments*/\n"
+ " String str4 = \"('file:///xghsghsh.html/')\"; //Comments\n"
+ " String str5 = \"{\"temperature\": {\"type\"}}"; //comments";
final String subst = "$1";
// The substituted value will be contained in the result variable
final String result = string.replaceAll(regex,subst);
System.out.println("Substitution result: " + result);
除了str5以外,它都可以工作。
答案 0 :(得分:4)
要使其工作,您需要“跳过”字符串文字。您可以通过匹配字符串文字并捕获它们来保留它们。
以下正则表达式将使用$1
作为替换字符串来做到这一点:
//.*|/\*(?s:.*?)\*/|("(?:(?<!\\)(?:\\\\)*\\"|[^\r\n"])*")
有关演示,请参见regex101。
然后是Java代码:
str1.replaceAll("//.*|/\\*(?s:.*?)\\*/|(\"(?:(?<!\\\\)(?:\\\\\\\\)*\\\\\"|[^\r\n\"])*\")", "$1")
说明
//.* Match // and rest of line
| or
/\*(?s:.*?)\*/ Match /* and */, with any characters in-between, incl. linebreaks
| or
(" Start capture group and match "
(?: Start repeating group:
(?<!\\)(?:\\\\)*\\" Match escaped " optionally prefixed by escaped \'s
| or
[^\r\n"] Match any character except " and linebreak
)* End of repeating group
") Match terminating ", and end of capture group
$1 Keep captured string literal
答案 1 :(得分:0)
正如其他人所说,正则表达式不是一个很好的选择。
您可以使用简单的DFA来完成此任务。
这是一个示例,可让您获得多行注释(/* */
)的间隔。
您可以对单行注释(// -- \n
)执行相同的操作。
String input = ...; //here's your input String
//0 - source code,
//1 - multiple lines comment (start) (/ char)
//2 - multiple lines comment (start) (* char)
//3 - multiple lines comment (finish) (* char)
//4 - multiple lines comment (finish) (/ char)
byte state = 0;
int startPos = -1;
int endPos = -1;
for (int i = 0; i < input.length(); i++) {
switch (state) {
case 0:
if (input.charAt(i) == '/') {
state = 1;
startPos = i;
}
break;
case 1:
if (input.charAt(i) == '*') {
state = 2;
}
break;
case 2:
if (input.charAt(i) == '*') {
state = 3;
}
break;
case 3:
if (input.charAt(i) == '/') {
state = 0;
endPos = i+1;
//here you have the comment between startPos and endPos indices,
//you can do whatever you want with it
}
break;
default:
break;
}
}
答案 2 :(得分:0)
{...希望我可以发表评论...}
我建议进行两遍处理;一个基于行尾(//),另一个不基于行尾(/ * * /)。
我喜欢帕维尔的想法;但是,我看不到如何检查以确保星形是斜线后的下一个字符,并且在关闭时反之亦然。
我喜欢安德里亚斯的想法;但是,我无法将其用于多行注释。
https://docs.oracle.com/javase/specs/jls/se12/html/jls-3.html#jls-CommentTail
答案 3 :(得分:-1)
也许,最好逐步地使用多个简单的表达式,例如:
.*(\s*\/\*.*|\s*\/\/.*)
最初删除嵌入式注释。
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = "(.*)(\\s*\\/\\*.*|\\s*\\/\\/.*)";
final String string = " String str1 = \"SUM 10\" /*This is a Comments */ ; \n"
+ " String str2 = \"SUM 10\"; //This is a Comments\" \n"
+ " String str3 = \"http://google.com\"; /*This is a Comments*/\n"
+ " String str4 = \"('file:///xghsghsh.html/')\"; //Comments\n"
+ " String str5 = \"{\\\"temperature\\\": {\\\"type\\\"}}\"; //comments";
final String subst = "\\1";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
// The substituted value will be contained in the result variable
final String result = matcher.replaceAll(subst);
System.out.println("Substitution result: " + result);