我需要一个正则表达式来捕获T-SQL块中的所有注释。 Expression需要使用.Net Regex类。
假设我有以下T-SQL:
-- This is Comment 1
SELECT Foo FROM Bar
GO
-- This is
-- Comment 2
UPDATE Bar SET Foo == 'Foo'
GO
/* This is Comment 3 */
DELETE FROM Bar WHERE Foo = 'Foo'
/* This is a
multi-line comment */
DROP TABLE Bar
我需要捕获所有注释,包括多行注释,以便我可以删除它们。
编辑:有一个表达式可以获取所有内容但是评论的目的是相同的。
答案 0 :(得分:16)
这应该有效:
(--.*)|(((/\*)+?[\w\W]+?(\*/)+))
答案 1 :(得分:9)
在PHP中,我使用此代码取消注释SQL(这是注释版本 - > x修饰符):
trim( preg_replace( '@
(([\'"]).*?[^\\\]\2) # $1 : Skip single & double quoted expressions
|( # $3 : Match comments
(?:\#|--).*?$ # - Single line comment
| # - Multi line (nested) comments
/\* # . comment open marker
(?: [^/*] # . non comment-marker characters
|/(?!\*) # . not a comment open
|\*(?!/) # . not a comment close
|(?R) # . recursive case
)* # . repeat eventually
\*\/ # . comment close marker
)\s* # Trim after comments
|(?<=;)\s+ # Trim after semi-colon
@msx', '$1', $sql ) );
简短版本:
trim( preg_replace( '@(([\'"]).*?[^\\\]\2)|((?:\#|--).*?$|/\*(?:[^/*]|/(?!\*)|\*(?!/)|(?R))*\*\/)\s*|(?<=;)\s+@ms', '$1', $sql ) );
答案 2 :(得分:5)
使用此代码:
StringCollection resultList = new StringCollection();
try {
Regex regexObj = new Regex(@"/\*(?>(?:(?!\*/|/\*).)*)(?>(?:/\*(?>(?:(?!\*/|/\*).)*)\*/(?>(?:(?!\*/|/\*).)*))*).*?\*/|--.*?\r?[\n]", RegexOptions.Singleline);
Match matchResult = regexObj.Match(subjectString);
while (matchResult.Success) {
resultList.Add(matchResult.Value);
matchResult = matchResult.NextMatch();
}
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
使用以下输入:
-- This is Comment 1
SELECT Foo FROM Bar
GO
-- This is
-- Comment 2
UPDATE Bar SET Foo == 'Foo'
GO
/* This is Comment 3 */
DELETE FROM Bar WHERE Foo = 'Foo'
/* This is a
multi-line comment */
DROP TABLE Bar
/* comment /* nesting */ of /* two */ levels supported */
foo...
制作这些比赛:
-- This is Comment 1
-- This is
-- Comment 2
/* This is Comment 3 */
/* This is a
multi-line comment */
/* comment /* nesting */ of /* two */ levels supported */
并不是说这只会匹配2级嵌套评论,尽管在我的生活中我从未见过使用多个级别。如初。
答案 3 :(得分:3)
我创建了这个删除所有SQL注释的函数,使用普通的常规表达式。它会删除行注释(即使之后没有换行符)和阻止注释(即使存在嵌套块注释)。此函数还可以替换文字(如果您在SQL过程中搜索某些内容但想要忽略字符串,则非常有用)。
我的代码基于这个answer(关于C#注释),所以我不得不将行注释从“//”更改为“ - ”,但更重要的是我不得不重写块注释正则表达式(使用平衡组),因为 SQL允许嵌套块注释,而C#则不允许。
另外,我有这个“ preservePositions ”参数,它不是剥离注释而只是用空格填充注释。如果您想保留每个SQL命令的原始位置,以防需要在保留原始注释的同时操作原始脚本,这非常有用。
Regex everythingExceptNewLines = new Regex("[^\r\n]");
public string RemoveComments(string input, bool preservePositions, bool removeLiterals=false)
{
//based on https://stackoverflow.com/questions/3524317/regex-to-strip-line-comments-from-c-sharp/3524689#3524689
var lineComments = @"--(.*?)\r?\n";
var lineCommentsOnLastLine = @"--(.*?)$"; // because it's possible that there's no \r\n after the last line comment
// literals ('literals'), bracketedIdentifiers ([object]) and quotedIdentifiers ("object"), they follow the same structure:
// there's the start character, any consecutive pairs of closing characters are considered part of the literal/identifier, and then comes the closing character
var literals = @"('(('')|[^'])*')"; // 'John', 'O''malley''s', etc
var bracketedIdentifiers = @"\[((\]\])|[^\]])* \]"; // [object], [ % object]] ], etc
var quotedIdentifiers = @"(\""((\""\"")|[^""])*\"")"; // "object", "object[]", etc - when QUOTED_IDENTIFIER is set to ON, they are identifiers, else they are literals
//var blockComments = @"/\*(.*?)\*/"; //the original code was for C#, but Microsoft SQL allows a nested block comments // //https://msdn.microsoft.com/en-us/library/ms178623.aspx
//so we should use balancing groups // http://weblogs.asp.net/whaggard/377025
var nestedBlockComments = @"/\*
(?>
/\* (?<LEVEL>) # On opening push level
|
\*/ (?<-LEVEL>) # On closing pop level
|
(?! /\* | \*/ ) . # Match any char unless the opening and closing strings
)+ # /* or */ in the lookahead string
(?(LEVEL)(?!)) # If level exists then fail
\*/";
string noComments = Regex.Replace(input,
nestedBlockComments + "|" + lineComments + "|" + lineCommentsOnLastLine + "|" + literals + "|" + bracketedIdentifiers + "|" + quotedIdentifiers,
me => {
if (me.Value.StartsWith("/*") && preservePositions)
return everythingExceptNewLines.Replace(me.Value, " "); // preserve positions and keep line-breaks // return new string(' ', me.Value.Length);
else if (me.Value.StartsWith("/*") && !preservePositions)
return "";
else if (me.Value.StartsWith("--") && preservePositions)
return everythingExceptNewLines.Replace(me.Value, " "); // preserve positions and keep line-breaks
else if (me.Value.StartsWith("--") && !preservePositions)
return everythingExceptNewLines.Replace(me.Value, ""); // preserve only line-breaks // Environment.NewLine;
else if (me.Value.StartsWith("[") || me.Value.StartsWith("\""))
return me.Value; // do not remove object identifiers ever
else if (!removeLiterals) // Keep the literal strings
return me.Value;
else if (removeLiterals && preservePositions) // remove literals, but preserving positions and line-breaks
{
var literalWithLineBreaks = everythingExceptNewLines.Replace(me.Value, " ");
return "'" + literalWithLineBreaks.Substring(1, literalWithLineBreaks.Length - 2) + "'";
}
else if (removeLiterals && !preservePositions) // wrap completely all literals
return "''";
else
throw new NotImplementedException();
},
RegexOptions.Singleline | RegexOptions.IgnorePatternWhitespace);
return noComments;
}
测试1(第一个原文,然后删除注释,最后删除注释/文字)
[select /* block comment */ top 1 'a' /* block comment /* nested block comment */*/ from sys.tables --LineComment
union
select top 1 '/* literal with */-- lots of comments symbols' from sys.tables --FinalLineComment]
[select top 1 'a' from sys.tables
union
select top 1 '/* literal with */-- lots of comments symbols' from sys.tables ]
[select top 1 ' ' from sys.tables
union
select top 1 ' ' from sys.tables ]
测试2(第一个原文,然后删除注释,最后删除注释/文字)
Original:
[create table [/*] /*
-- huh? */
(
"--
--" integer identity, -- /*
[*/] varchar(20) /* -- */
default '*/ /* -- */' /* /* /* */ */ */
);
go]
[create table [/*]
(
"--
--" integer identity,
[*/] varchar(20)
default '*/ /* -- */'
);
go]
[create table [/*]
(
"--
--" integer identity,
[*/] varchar(20)
default ' '
);
go]
答案 4 :(得分:1)
这对我有用:
(/\*(.|[\r\n])*?\*/)|(--(.*|[\r\n]))
它匹配以 - 开头的所有注释 - 或包含在* / .. * / blocks
中答案 5 :(得分:1)
我看到你正在使用微软的SQL Server(而不是Oracle或MySQL)。 如果你放松了正则表达式的要求,现在可以(自2012年起)使用微软自己的解析器:
using Microsoft.SqlServer.Management.TransactSql.ScriptDom;
...
public string StripCommentsFromSQL( string SQL ) {
TSql110Parser parser = new TSql110Parser( true );
IList<ParseError> errors;
var fragments = parser.Parse( new System.IO.StringReader( SQL ), out errors );
// clear comments
string result = string.Join (
string.Empty,
fragments.ScriptTokenStream
.Where( x => x.TokenType != TSqlTokenType.MultilineComment )
.Where( x => x.TokenType != TSqlTokenType.SingleLineComment )
.Select( x => x.Text ) );
return result;
}
答案 6 :(得分:1)
以下工作正常 - pg-minify,不仅适用于PostgreSQL,也适用于MS-SQL。
据推测,如果我们删除注释,这意味着脚本不再用于阅读,同时缩小它是一个好主意。
该库会删除所有注释,作为脚本缩小的一部分。
答案 7 :(得分:0)
我正在使用此 java 代码从文本中删除所有 sql 注释。它支持 /* ... */ 、 --... 等注释,嵌套注释,忽略引号内的注释
public static String stripComments(String sqlCommand) {
StringBuilder result = new StringBuilder();
//group 1 must be quoted string
Pattern pattern = Pattern.compile("('(''|[^'])*')|(/\\*(.|[\\r\\n])*?\\*/)|(--(.*|[\\r\\n]))");
Matcher matcher = pattern.matcher(sqlCommand);
int prevIndex = 0;
while(matcher.find()) {
// add previous portion of string that was not found by regexp - meaning this is not a quoted string and not a comment
result.append(sqlCommand, prevIndex, matcher.start());
prevIndex = matcher.end();
// add the quoted string
if (matcher.group(1) != null) {
result.append(sqlCommand, matcher.start(), matcher.end());
}
}
result.append(sqlCommand.substring(prevIndex));
return result.toString();
}