Java Regex发现Oracle单行注释除了字符串外

时间:2011-12-09 13:21:39

标签: java regex

查找Oracle单行注释,但字符串中出现的注释除外。

例如:

-- This is a valid single line comment

但是

'This is a string -- and it is not a comment';

我正在使用此正则表达式来查找单行注释

--.*$

可以处理几个案例,但有几个复杂的案例。您可以使用此脚本作为参考

-- this is a single line comment

CREATE OR REPLACE PROCEDURE "MAIL_WITH_ATTACHMENT" ( ) 
IS    
tmp varchar(2) ; -- this is a comment 
tmp1 varchar(2) := 'some texxt'; -- this is another comment
tmp2 varchar(3) := 'some more --text'; -- this is one more comment
tmp3 varchar(4) := 'this regex isn't --working properly'; -- Don't you think this is another comment
BEGIN

          '--This is a Mime message, which your current mail reader may not' || crlf ||
          ' some more -- characters in a string';

    mesg:= crlf ||
          '--This is a Mime message, which your current mail reader may not' || crlf ||
      ' some more -- characters in a string';
END; 

结果必须是此

[1] : -- this is a single line comment
[2] : -- this is a comment 
[3] : -- this is another comment
[4] : -- this is one more comment
[5] : -- Don't you think this is another comment

由于

3 个答案:

答案 0 :(得分:4)

就个人而言,我会使用SQL解析器来删除这些注释。正则表达式的问题在于它并不真正意识到其周围环境:正则表达式很难确定单引号是否在注释中,或者--是否在字符串文字内。

可以通过使用与行开头匹配的正则表达式来绕过这一点,并匹配字符串文字。使其更像是词法分析器(解析的第一阶段)。

这样的正则表达式可能如下所示:

(?m)^((?:(?!--|').|'(?:''|[^'])*')*)--.*$

快速分解正则表达式:

(?m)                 # enable multi-line mode
^                    # match the start of the line
(                    # start match group 1
  (?:                #   start non-capturing group 1
    (?!--|').        #     if there's no '--' or single quote ahead, match any char (except a line break)
    |                #     OR
    '(?:''|[^'])*'   #     match a string literal
  )*                 #   end non-capturing group 1 and repeat it zero or more times
)                    # end match group 1
--.*$                # match a comment all the way to the end of the line

简单的英语,如下所示:从一行的每个开头,尝试匹配零或更多:

  • 字符串文字('(?:''|[^'])*');
  • 或任何字符,只要它不是单引号,换行符或-是评论的一部分((?!--|').)。

并将此匹配存储在第1组中。然后匹配评论(--.*$)。

所以现在你需要做的就是用第1组中匹配的任何东西替换这个模式。演示:

String sql = "-- this is a single line comment\n" +
             "\n" +
             "CREATE OR REPLACE PROCEDURE \"MAIL_WITH_ATTACHMENT\" ( ) \n" +
             "IS    \n" +
             "tmp varchar(2) ; -- this is a comment \n" +
             "tmp1 varchar(2) := 'some texxt'; -- this is another comment\n" +
             "tmp2 varchar(3) := 'some more --text'; -- this is one more comment\n" +
             "tmp3 varchar(4) := 'this regex isn''t --working properly'; -- Don't you think this is another comment\n" +
             "BEGIN\n" +
             "\n" +
             "          '--This is a Mime message, which your current mail reader may not' || crlf ||\n" +
             "          ' some more -- characters in a string';\n" +
             "\n" +
             "    mesg:= crlf ||\n" +
             "          '--This is a Mime message, which your current mail reader may not' || crlf ||\n" +
             "      ' some more -- characters in a string';\n" +
             "END; ";
String stripped = sql.replaceAll("(?m)^((?:(?!--|').|'(?:''|[^'])*')*)--.*$", "$1[REMOVED COMMENT]");
System.out.println(stripped);

将打印:

[REMOVED COMMENT]

CREATE OR REPLACE PROCEDURE "MAIL_WITH_ATTACHMENT" ( ) 
IS    
tmp varchar(2) ; [REMOVED COMMENT]
tmp1 varchar(2) := 'some texxt'; [REMOVED COMMENT]
tmp2 varchar(3) := 'some more --text'; [REMOVED COMMENT]
tmp3 varchar(4) := 'this regex isn''t --working properly'; [REMOVED COMMENT]
BEGIN

          '--This is a Mime message, which your current mail reader may not' || crlf ||
          ' some more -- characters in a string';

    mesg:= crlf ||
          '--This is a Mime message, which your current mail reader may not' || crlf ||
      ' some more -- characters in a string';
END; 

修改

如果您只想提取评论,请将捕获组包裹在--.*$周围并使用Pattern&匹配的Matcherfind()

Matcher m = Pattern.compile("(?m)^(?:(?!--|').|'(?:''|[^'])*')*(--.*)$").matcher(sql);
while(m.find()) {
  System.out.println(m.group(1));
}

将打印:

-- this is a single line comment
-- this is a comment 
-- this is another comment
-- this is one more comment
-- Don't you think this is another comment

答案 1 :(得分:1)

这应该有所帮助。如果你逐行阅读;

   str = str.replaceAll("'{1}.*'{1}", "").replaceFirst(".*--", "--");

输入: -sd' - asdsa --- asdsadasdsad' || ' asdsad' || ' asdsadasd' - 这里x某事

输出: - 此处为x

编辑:3次编辑后的最终版本:)

答案 2 :(得分:1)

这个正则表达式应该可以正常工作:

Pattern p = Pattern.compile("^[^']*('[^']*'[^']*)*(--.*)$");

除了案例[5]。但在开始使正则表达式过度复杂化之前,您确定Oracle不会抱怨该字符串吗?

修改

这是我用来测试正则表达式的代码

String[] text =
    {
        "-- this is a single line comment",
        "",
        "CREATE OR REPLACE PROCEDURE \"MAIL_WITH_ATTACHMENT\" ( ) ",
        "IS    ",
        "tmp varchar(2) ; -- this is a comment ",
        "tmp1 varchar(2) := 'some texxt'; -- this is another comment",
        "tmp2 varchar(3) := 'some more --text'; 'blah --blah' -- this is one more comment",
        "tmp3 varchar(4) := 'this regex isn't --working properly'; -- Don't you think this is another comment",
        "BEGIN",
        "",
        "          '--This is a Mime message, which your current mail reader may not' || crlf ||",
        "          ' some more -- characters in a string';",
        "",
        "    mesg:= crlf ||",
        "          '--This is a Mime message, which your current mail reader may not' || crlf ||",
        "      ' some more -- characters in a string';", "END; ", };

Pattern p = Pattern.compile("^[^']*('[^']*'[^']*)*(--.*)$");
Matcher m = p.matcher("");

for (String s : text) {
  m.reset(s);
  if (m.find()) {
    System.out.println(m.group(m.groupCount()));
  }
}

这是输出:

-- this is a single line comment
-- this is a comment 
-- this is another comment
-- this is one more comment
--working properly'; -- Don't you think this is another comment

如您所见,输出的最后一行是“错误的”。但是,正如你所说,Oracle也不喜欢这样的字符串。将isn't更正为isn''t后,outoput也将是正确的。