Question

我想从Java源代码文件中删除所有类型的注释语句。示例：

    String str1 = "SUM 10"      /*This is a Comments */ ;   
    String str2 = "SUM 10";     //This is a Comments"  
    String str3 = "http://google.com";   /*This is a Comments*/
    String str4 = "('file:///xghsghsh.html/')";  //Comments
    String str5 = "{\"temperature\": {\"type\"}}";  //comments

预期输出：

    String str1 = "SUM 10"; 
    String str2 = "SUM 10";  
    String str3 = "http://google.com";
    String str4 = "('file:///xghsghsh.html/')";
    String str5 = "{\"temperature\": {\"type\"}}";

我正在使用以下正则表达式来实现：

    System.out.println(str1.replaceAll("[^:]//.*|/\\\\*((?!=*/)(?s:.))+\\\\*/", ""));

这给我str4和str5错误的结果。请帮助我解决此问题。

使用Andreas解决方案：

        final String regex = "//.*|/\\*(?s:.*?)\\*/|(\"(?:(?<!\\\\)(?:\\\\\\\\)*\\\\\"|[^\\r\\n\"])*\")";
        final String string = "    String str1 = \"SUM 10\"      /*This is a Comments */ ;   \n"
             + "    String str2 = \"SUM 10\";     //This is a Comments\"  \n"
             + "    String str3 = \"http://google.com\";   /*This is a Comments*/\n"
             + "    String str4 = \"('file:///xghsghsh.html/')\";  //Comments\n"
             + "    String str5 = \"{\"temperature\": {\"type\"}}";  //comments";
        final String subst = "$1";

        // The substituted value will be contained in the result variable
        final String result = string.replaceAll(regex,subst);

        System.out.println("Substitution result: " + result);

除了str5以外，它都可以工作。

Answer 1

要使其工作，您需要“跳过”字符串文字。您可以通过匹配字符串文字并捕获它们来保留它们。

以下正则表达式将使用$1作为替换字符串来做到这一点：

//.*|/\*(?s:.*?)\*/|("(?:(?<!\\)(?:\\\\)*\\"|[^\r\n"])*")

有关演示，请参见regex101。

然后是Java代码：

str1.replaceAll("//.*|/\\*(?s:.*?)\\*/|(\"(?:(?<!\\\\)(?:\\\\\\\\)*\\\\\"|[^\r\n\"])*\")", "$1")

说明

//.*                      Match // and rest of line
|                        or
/\*(?s:.*?)\*/            Match /* and */, with any characters in-between, incl. linebreaks
|                        or
("                        Start capture group and match "
  (?:                      Start repeating group:
     (?<!\\)(?:\\\\)*\\"     Match escaped " optionally prefixed by escaped \'s
     |                      or
     [^\r\n"]                Match any character except " and linebreak
  )*                       End of repeating group
")                        Match terminating ", and end of capture group

$1                        Keep captured string literal

Answer 2

正如其他人所说，正则表达式不是一个很好的选择。您可以使用简单的DFA来完成此任务。
这是一个示例，可让您获得多行注释（/* */）的间隔。
您可以对单行注释（// -- \n）执行相同的操作。

    String input = ...; //here's your input String

    //0 - source code, 
    //1 - multiple lines comment (start) (/ char)
    //2 - multiple lines comment (start) (* char)
    //3 - multiple lines comment (finish) (* char)
    //4 - multiple lines comment (finish) (/ char)
    byte state = 0; 
    int startPos = -1;
    int endPos = -1;
    for (int i = 0; i < input.length(); i++) {
        switch (state) {
        case 0:
            if (input.charAt(i) == '/') {
                   state = 1;
                   startPos = i;
            }
            break;
        case 1:
            if (input.charAt(i) == '*') {
                state = 2;
            }
            break;
        case 2:
            if (input.charAt(i) == '*') {
               state = 3;
            }
            break;
        case 3:
            if (input.charAt(i) == '/') {
                state = 0;
                endPos = i+1;

                //here you have the comment between startPos and endPos indices,
                //you can do whatever you want with it
            }

            break;
        default:
            break;
        }
    }

Answer 3

{...希望我可以发表评论...}

我建议进行两遍处理；一个基于行尾（//），另一个不基于行尾（/ * * /）。

我喜欢帕维尔的想法；但是，我看不到如何检查以确保星形是斜线后的下一个字符，并且在关闭时反之亦然。

我喜欢安德里亚斯的想法；但是，我无法将其用于多行注释。

https://docs.oracle.com/javase/specs/jls/se12/html/jls-3.html#jls-CommentTail

Answer 4

也许，最好逐步地使用多个简单的表达式，例如：

.*(\s*\/\*.*|\s*\/\/.*)

最初删除嵌入式注释。

Demo

测试

import java.util.regex.Matcher;
import java.util.regex.Pattern;

final String regex = "(.*)(\\s*\\/\\*.*|\\s*\\/\\/.*)";
final String string = "    String str1 = \"SUM 10\"      /*This is a Comments */ ;   \n"
     + "    String str2 = \"SUM 10\";     //This is a Comments\"  \n"
     + "    String str3 = \"http://google.com\";   /*This is a Comments*/\n"
     + "    String str4 = \"('file:///xghsghsh.html/')\";  //Comments\n"
     + "    String str5 = \"{\\\"temperature\\\": {\\\"type\\\"}}\";  //comments";
final String subst = "\\1";

final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);

// The substituted value will be contained in the result variable
final String result = matcher.replaceAll(subst);

System.out.println("Substitution result: " + result);

从Java中的源代码中删除注释

4 个答案:

Demo

测试