查找Oracle单行注释,但字符串中出现的注释除外。
例如:
-- This is a valid single line comment
但是
'This is a string -- and it is not a comment';
我正在使用此正则表达式来查找单行注释
--.*$
可以处理几个案例,但有几个复杂的案例。您可以使用此脚本作为参考
-- this is a single line comment
CREATE OR REPLACE PROCEDURE "MAIL_WITH_ATTACHMENT" ( )
IS
tmp varchar(2) ; -- this is a comment
tmp1 varchar(2) := 'some texxt'; -- this is another comment
tmp2 varchar(3) := 'some more --text'; -- this is one more comment
tmp3 varchar(4) := 'this regex isn't --working properly'; -- Don't you think this is another comment
BEGIN
'--This is a Mime message, which your current mail reader may not' || crlf ||
' some more -- characters in a string';
mesg:= crlf ||
'--This is a Mime message, which your current mail reader may not' || crlf ||
' some more -- characters in a string';
END;
结果必须是此
[1] : -- this is a single line comment
[2] : -- this is a comment
[3] : -- this is another comment
[4] : -- this is one more comment
[5] : -- Don't you think this is another comment
由于
答案 0 :(得分:4)
就个人而言,我会使用SQL解析器来删除这些注释。正则表达式的问题在于它并不真正意识到其周围环境:正则表达式很难确定单引号是否在注释中,或者--
是否在字符串文字内。
你可以通过使用与行开头匹配的正则表达式来绕过这一点,并匹配字符串文字。使其更像是词法分析器(解析的第一阶段)。
这样的正则表达式可能如下所示:
(?m)^((?:(?!--|').|'(?:''|[^'])*')*)--.*$
快速分解正则表达式:
(?m) # enable multi-line mode
^ # match the start of the line
( # start match group 1
(?: # start non-capturing group 1
(?!--|'). # if there's no '--' or single quote ahead, match any char (except a line break)
| # OR
'(?:''|[^'])*' # match a string literal
)* # end non-capturing group 1 and repeat it zero or more times
) # end match group 1
--.*$ # match a comment all the way to the end of the line
简单的英语,如下所示:从一行的每个开头,尝试匹配零或更多:
'(?:''|[^'])*'
); -
是评论的一部分((?!--|').
)。并将此匹配存储在第1组中。然后匹配评论(--.*$
)。
所以现在你需要做的就是用第1组中匹配的任何东西替换这个模式。演示:
String sql = "-- this is a single line comment\n" +
"\n" +
"CREATE OR REPLACE PROCEDURE \"MAIL_WITH_ATTACHMENT\" ( ) \n" +
"IS \n" +
"tmp varchar(2) ; -- this is a comment \n" +
"tmp1 varchar(2) := 'some texxt'; -- this is another comment\n" +
"tmp2 varchar(3) := 'some more --text'; -- this is one more comment\n" +
"tmp3 varchar(4) := 'this regex isn''t --working properly'; -- Don't you think this is another comment\n" +
"BEGIN\n" +
"\n" +
" '--This is a Mime message, which your current mail reader may not' || crlf ||\n" +
" ' some more -- characters in a string';\n" +
"\n" +
" mesg:= crlf ||\n" +
" '--This is a Mime message, which your current mail reader may not' || crlf ||\n" +
" ' some more -- characters in a string';\n" +
"END; ";
String stripped = sql.replaceAll("(?m)^((?:(?!--|').|'(?:''|[^'])*')*)--.*$", "$1[REMOVED COMMENT]");
System.out.println(stripped);
将打印:
[REMOVED COMMENT]
CREATE OR REPLACE PROCEDURE "MAIL_WITH_ATTACHMENT" ( )
IS
tmp varchar(2) ; [REMOVED COMMENT]
tmp1 varchar(2) := 'some texxt'; [REMOVED COMMENT]
tmp2 varchar(3) := 'some more --text'; [REMOVED COMMENT]
tmp3 varchar(4) := 'this regex isn''t --working properly'; [REMOVED COMMENT]
BEGIN
'--This is a Mime message, which your current mail reader may not' || crlf ||
' some more -- characters in a string';
mesg:= crlf ||
'--This is a Mime message, which your current mail reader may not' || crlf ||
' some more -- characters in a string';
END;
如果您只想提取评论,请将捕获组包裹在--.*$
周围并使用Pattern
&匹配的Matcher
到find()
:
Matcher m = Pattern.compile("(?m)^(?:(?!--|').|'(?:''|[^'])*')*(--.*)$").matcher(sql);
while(m.find()) {
System.out.println(m.group(1));
}
将打印:
-- this is a single line comment
-- this is a comment
-- this is another comment
-- this is one more comment
-- Don't you think this is another comment
答案 1 :(得分:1)
这应该有所帮助。如果你逐行阅读;
str = str.replaceAll("'{1}.*'{1}", "").replaceFirst(".*--", "--");
输入: -sd' - asdsa --- asdsadasdsad' || ' asdsad' || ' asdsadasd' - 这里x某事
输出: - 此处为x
编辑:3次编辑后的最终版本:)
答案 2 :(得分:1)
这个正则表达式应该可以正常工作:
Pattern p = Pattern.compile("^[^']*('[^']*'[^']*)*(--.*)$");
除了案例[5]。但在开始使正则表达式过度复杂化之前,您确定Oracle不会抱怨该字符串吗?
修改强>
这是我用来测试正则表达式的代码
String[] text =
{
"-- this is a single line comment",
"",
"CREATE OR REPLACE PROCEDURE \"MAIL_WITH_ATTACHMENT\" ( ) ",
"IS ",
"tmp varchar(2) ; -- this is a comment ",
"tmp1 varchar(2) := 'some texxt'; -- this is another comment",
"tmp2 varchar(3) := 'some more --text'; 'blah --blah' -- this is one more comment",
"tmp3 varchar(4) := 'this regex isn't --working properly'; -- Don't you think this is another comment",
"BEGIN",
"",
" '--This is a Mime message, which your current mail reader may not' || crlf ||",
" ' some more -- characters in a string';",
"",
" mesg:= crlf ||",
" '--This is a Mime message, which your current mail reader may not' || crlf ||",
" ' some more -- characters in a string';", "END; ", };
Pattern p = Pattern.compile("^[^']*('[^']*'[^']*)*(--.*)$");
Matcher m = p.matcher("");
for (String s : text) {
m.reset(s);
if (m.find()) {
System.out.println(m.group(m.groupCount()));
}
}
这是输出:
-- this is a single line comment
-- this is a comment
-- this is another comment
-- this is one more comment
--working properly'; -- Don't you think this is another comment
如您所见,输出的最后一行是“错误的”。但是,正如你所说,Oracle也不喜欢这样的字符串。将isn't
更正为isn''t
后,outoput也将是正确的。