从Java中的源代码中删除注释

时间:2019-07-05 16:26:18

标签: java regex

我想从Java源代码文件中删除所有类型的注释语句。 示例:

    String str1 = "SUM 10"      /*This is a Comments */ ;   
    String str2 = "SUM 10";     //This is a Comments"  
    String str3 = "http://google.com";   /*This is a Comments*/
    String str4 = "('file:///xghsghsh.html/')";  //Comments
    String str5 = "{\"temperature\": {\"type\"}}";  //comments

预期输出:

    String str1 = "SUM 10"; 
    String str2 = "SUM 10";  
    String str3 = "http://google.com";
    String str4 = "('file:///xghsghsh.html/')";
    String str5 = "{\"temperature\": {\"type\"}}";

我正在使用以下正则表达式来实现:

    System.out.println(str1.replaceAll("[^:]//.*|/\\\\*((?!=*/)(?s:.))+\\\\*/", ""));

这给我str4和str5错误的结果。 请帮助我解决此问题。

使用Andreas解决方案:

        final String regex = "//.*|/\\*(?s:.*?)\\*/|(\"(?:(?<!\\\\)(?:\\\\\\\\)*\\\\\"|[^\\r\\n\"])*\")";
        final String string = "    String str1 = \"SUM 10\"      /*This is a Comments */ ;   \n"
             + "    String str2 = \"SUM 10\";     //This is a Comments\"  \n"
             + "    String str3 = \"http://google.com\";   /*This is a Comments*/\n"
             + "    String str4 = \"('file:///xghsghsh.html/')\";  //Comments\n"
             + "    String str5 = \"{\"temperature\": {\"type\"}}";  //comments";
        final String subst = "$1";

        // The substituted value will be contained in the result variable
        final String result = string.replaceAll(regex,subst);

        System.out.println("Substitution result: " + result);

除了str5以外,它都可以工作。

4 个答案:

答案 0 :(得分:4)

要使其工作,您需要“跳过”字符串文字。您可以通过匹配字符串文字并捕获它们来保留它们。

以下正则表达式将使用$1作为替换字符串来做到这一点:

//.*|/\*(?s:.*?)\*/|("(?:(?<!\\)(?:\\\\)*\\"|[^\r\n"])*")

有关演示,请参见regex101

然后是Java代码:

str1.replaceAll("//.*|/\\*(?s:.*?)\\*/|(\"(?:(?<!\\\\)(?:\\\\\\\\)*\\\\\"|[^\r\n\"])*\")", "$1")

说明

//.*                      Match // and rest of line
|                        or
/\*(?s:.*?)\*/            Match /* and */, with any characters in-between, incl. linebreaks
|                        or
("                        Start capture group and match "
  (?:                      Start repeating group:
     (?<!\\)(?:\\\\)*\\"     Match escaped " optionally prefixed by escaped \'s
     |                      or
     [^\r\n"]                Match any character except " and linebreak
  )*                       End of repeating group
")                        Match terminating ", and end of capture group
$1                        Keep captured string literal

答案 1 :(得分:0)

正如其他人所说,正则表达式不是一个很好的选择。 您可以使用简单的DFA来完成此任务。
这是一个示例,可让您获得多行注释(/* */)的间隔。
您可以对单行注释(// -- \n)执行相同的操作。

    String input = ...; //here's your input String

    //0 - source code, 
    //1 - multiple lines comment (start) (/ char)
    //2 - multiple lines comment (start) (* char)
    //3 - multiple lines comment (finish) (* char)
    //4 - multiple lines comment (finish) (/ char)
    byte state = 0; 
    int startPos = -1;
    int endPos = -1;
    for (int i = 0; i < input.length(); i++) {
        switch (state) {
        case 0:
            if (input.charAt(i) == '/') {
                   state = 1;
                   startPos = i;
            }
            break;
        case 1:
            if (input.charAt(i) == '*') {
                state = 2;
            }
            break;
        case 2:
            if (input.charAt(i) == '*') {
               state = 3;
            }
            break;
        case 3:
            if (input.charAt(i) == '/') {
                state = 0;
                endPos = i+1;

                //here you have the comment between startPos and endPos indices,
                //you can do whatever you want with it
            }

            break;
        default:
            break;
        }
    }

答案 2 :(得分:0)

{...希望我可以发表评论...}

我建议进行两遍处理;一个基于行尾(//),另一个不基于行尾(/ * * /)。

我喜欢帕维尔的想法;但是,我看不到如何检查以确保星形是斜线后的下一个字符,并且在关闭时反之亦然。

我喜欢安德里亚斯的想法;但是,我无法将其用于多行注释。

https://docs.oracle.com/javase/specs/jls/se12/html/jls-3.html#jls-CommentTail

答案 3 :(得分:-1)

也许,最好逐步地使用多个简单的表达式,例如:

.*(\s*\/\*.*|\s*\/\/.*)

最初删除嵌入式注释。

Demo

测试

import java.util.regex.Matcher;
import java.util.regex.Pattern;

final String regex = "(.*)(\\s*\\/\\*.*|\\s*\\/\\/.*)";
final String string = "    String str1 = \"SUM 10\"      /*This is a Comments */ ;   \n"
     + "    String str2 = \"SUM 10\";     //This is a Comments\"  \n"
     + "    String str3 = \"http://google.com\";   /*This is a Comments*/\n"
     + "    String str4 = \"('file:///xghsghsh.html/')\";  //Comments\n"
     + "    String str5 = \"{\\\"temperature\\\": {\\\"type\\\"}}\";  //comments";
final String subst = "\\1";

final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);

// The substituted value will be contained in the result variable
final String result = matcher.replaceAll(subst);

System.out.println("Substitution result: " + result);