正则表达式匹配T-SQL脚本中的所有注释

时间:2011-10-07 16:51:33

标签: sql regex tsql

我需要一个正则表达式来捕获T-SQL块中的所有注释。 Expression需要使用.Net Regex类。

假设我有以下T-SQL:

-- This is Comment 1
SELECT Foo FROM Bar
GO

-- This is
-- Comment 2
UPDATE Bar SET Foo == 'Foo'
GO

/* This is Comment 3 */
DELETE FROM Bar WHERE Foo = 'Foo'

/* This is a
multi-line comment */
DROP TABLE Bar

我需要捕获所有注释,包括多行注释,以便我可以删除它们。

编辑:有一个表达式可以获取所有内容但是评论的目的是相同的。

8 个答案:

答案 0 :(得分:16)

这应该有效:

(--.*)|(((/\*)+?[\w\W]+?(\*/)+))

答案 1 :(得分:9)

在PHP中,我使用此代码取消注释SQL(这是注释版本 - > x修饰符):

trim( preg_replace( '@
(([\'"]).*?[^\\\]\2) # $1 : Skip single & double quoted expressions
|(                   # $3 : Match comments
    (?:\#|--).*?$    # - Single line comment
    |                # - Multi line (nested) comments
     /\*             #   . comment open marker
        (?: [^/*]    #   . non comment-marker characters
            |/(?!\*) #   . not a comment open
            |\*(?!/) #   . not a comment close
            |(?R)    #   . recursive case
        )*           #   . repeat eventually
    \*\/             #   . comment close marker
)\s*                 # Trim after comments
|(?<=;)\s+           # Trim after semi-colon
@msx', '$1', $sql ) );

简短版本:

trim( preg_replace( '@(([\'"]).*?[^\\\]\2)|((?:\#|--).*?$|/\*(?:[^/*]|/(?!\*)|\*(?!/)|(?R))*\*\/)\s*|(?<=;)\s+@ms', '$1', $sql ) );

答案 2 :(得分:5)

使用此代码:

StringCollection resultList = new StringCollection(); 
try {
Regex regexObj = new Regex(@"/\*(?>(?:(?!\*/|/\*).)*)(?>(?:/\*(?>(?:(?!\*/|/\*).)*)\*/(?>(?:(?!\*/|/\*).)*))*).*?\*/|--.*?\r?[\n]", RegexOptions.Singleline);
Match matchResult = regexObj.Match(subjectString);
while (matchResult.Success) {
    resultList.Add(matchResult.Value);
    matchResult = matchResult.NextMatch();
} 
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}

使用以下输入:

-- This is Comment 1
SELECT Foo FROM Bar
GO

-- This is
-- Comment 2
UPDATE Bar SET Foo == 'Foo'
GO

/* This is Comment 3 */
DELETE FROM Bar WHERE Foo = 'Foo'

/* This is a
multi-line comment */
DROP TABLE Bar

/* comment /* nesting */ of /* two */ levels supported */
foo...

制作这些比赛:

-- This is Comment 1
-- This is
-- Comment 2
/* This is Comment 3 */
/* This is a
multi-line comment */
/* comment /* nesting */ of /* two */ levels supported */

并不是说这只会匹配2级嵌套评论,尽管在我的生活中我从未见过使用多个级别。如初。

答案 3 :(得分:3)

我创建了这个删除所有SQL注释的函数,使用普通的常规表达式。它会删除行注释(即使之后没有换行符)和阻止注释(即使存在嵌套块注释)。此函数还可以替换文字(如果您在SQL过程中搜索某些内容但想要忽略字符串,则非常有用)。

我的代码基于这个answer(关于C#注释),所以我不得不将行注释从“//”更改为“ - ”,但更重要的是我不得不重写块注释正则表达式(使用平衡组),因为 SQL允许嵌套块注释,而C#则不允许。

另外,我有这个“ preservePositions ”参数,它不是剥离注释而只是用空格填充注释。如果您想保留每个SQL命令的原始位置,以防需要在保留原始注释的同时操作原始脚本,这非常有用。

Regex everythingExceptNewLines = new Regex("[^\r\n]");
public string RemoveComments(string input, bool preservePositions, bool removeLiterals=false)
{
    //based on https://stackoverflow.com/questions/3524317/regex-to-strip-line-comments-from-c-sharp/3524689#3524689

    var lineComments = @"--(.*?)\r?\n";
    var lineCommentsOnLastLine = @"--(.*?)$"; // because it's possible that there's no \r\n after the last line comment
    // literals ('literals'), bracketedIdentifiers ([object]) and quotedIdentifiers ("object"), they follow the same structure:
    // there's the start character, any consecutive pairs of closing characters are considered part of the literal/identifier, and then comes the closing character
    var literals = @"('(('')|[^'])*')"; // 'John', 'O''malley''s', etc
    var bracketedIdentifiers = @"\[((\]\])|[^\]])* \]"; // [object], [ % object]] ], etc
    var quotedIdentifiers = @"(\""((\""\"")|[^""])*\"")"; // "object", "object[]", etc - when QUOTED_IDENTIFIER is set to ON, they are identifiers, else they are literals
    //var blockComments = @"/\*(.*?)\*/";  //the original code was for C#, but Microsoft SQL allows a nested block comments // //https://msdn.microsoft.com/en-us/library/ms178623.aspx
    //so we should use balancing groups // http://weblogs.asp.net/whaggard/377025
    var nestedBlockComments = @"/\*
                                (?>
                                /\*  (?<LEVEL>)      # On opening push level
                                | 
                                \*/ (?<-LEVEL>)     # On closing pop level
                                |
                                (?! /\* | \*/ ) . # Match any char unless the opening and closing strings   
                                )+                         # /* or */ in the lookahead string
                                (?(LEVEL)(?!))             # If level exists then fail
                                \*/";

    string noComments = Regex.Replace(input,
            nestedBlockComments + "|" + lineComments + "|" + lineCommentsOnLastLine + "|" + literals + "|" + bracketedIdentifiers + "|" + quotedIdentifiers,
        me => {
            if (me.Value.StartsWith("/*") && preservePositions)
                return everythingExceptNewLines.Replace(me.Value, " "); // preserve positions and keep line-breaks // return new string(' ', me.Value.Length);
            else if (me.Value.StartsWith("/*") && !preservePositions)
                return "";
            else if (me.Value.StartsWith("--") && preservePositions)
                return everythingExceptNewLines.Replace(me.Value, " "); // preserve positions and keep line-breaks
            else if (me.Value.StartsWith("--") && !preservePositions)
                return everythingExceptNewLines.Replace(me.Value, ""); // preserve only line-breaks // Environment.NewLine;
            else if (me.Value.StartsWith("[") || me.Value.StartsWith("\""))
                return me.Value; // do not remove object identifiers ever
            else if (!removeLiterals) // Keep the literal strings
                return me.Value;
            else if (removeLiterals && preservePositions) // remove literals, but preserving positions and line-breaks
            {
                var literalWithLineBreaks = everythingExceptNewLines.Replace(me.Value, " ");
                return "'" + literalWithLineBreaks.Substring(1, literalWithLineBreaks.Length - 2) + "'";
            }
            else if (removeLiterals && !preservePositions) // wrap completely all literals
                return "''";
            else
                throw new NotImplementedException();
        },
        RegexOptions.Singleline | RegexOptions.IgnorePatternWhitespace);
    return noComments;
}

测试1(第一个原文,然后删除注释,最后删除注释/文字)

[select /* block comment */ top 1 'a' /* block comment /* nested block comment */*/ from  sys.tables --LineComment
union
select top 1 '/* literal with */-- lots of comments symbols' from sys.tables --FinalLineComment]

[select                     top 1 'a'                                               from  sys.tables              
union
select top 1 '/* literal with */-- lots of comments symbols' from sys.tables                   ]

[select                     top 1 ' '                                               from  sys.tables              
union
select top 1 '                                             ' from sys.tables                   ]

测试2(第一个原文,然后删除注释,最后删除注释/文字)

Original:
[create table [/*] /* 
  -- huh? */
(
    "--
     --" integer identity, -- /*
    [*/] varchar(20) /* -- */
         default '*/ /* -- */' /* /* /* */ */ */
);
            go]


[create table [/*]    

(
    "--
     --" integer identity,      
    [*/] varchar(20)         
         default '*/ /* -- */'                  
);
            go]


[create table [/*]    

(
    "--
     --" integer identity,      
    [*/] varchar(20)         
         default '           '                  
);
            go]

答案 4 :(得分:1)

这对我有用:

(/\*(.|[\r\n])*?\*/)|(--(.*|[\r\n]))

它匹配以 - 开头的所有注释 - 或包含在* / .. * / blocks

答案 5 :(得分:1)

我看到你正在使用微软的SQL Server(而不是Oracle或MySQL)。 如果你放松了正则表达式的要求,现在可以(自2012年起)使用微软自己的解析器:

using Microsoft.SqlServer.Management.TransactSql.ScriptDom;

...

public string StripCommentsFromSQL( string SQL ) {

    TSql110Parser parser = new TSql110Parser( true );
    IList<ParseError> errors;
    var fragments = parser.Parse( new System.IO.StringReader( SQL ), out errors );

    // clear comments
    string result = string.Join ( 
      string.Empty,
      fragments.ScriptTokenStream
          .Where( x => x.TokenType != TSqlTokenType.MultilineComment )
          .Where( x => x.TokenType != TSqlTokenType.SingleLineComment )
          .Select( x => x.Text ) );

    return result;

}

请参阅Removing Comments From SQL

答案 6 :(得分:1)

以下工作正常 - pg-minify,不仅适用于PostgreSQL,也适用于MS-SQL。

据推测,如果我们删除注释,这意味着脚本不再用于阅读,同时缩小它是一个好主意。

该库会删除所有注释,作为脚本缩小的一部分。

答案 7 :(得分:0)

我正在使用此 java 代码从文本中删除所有 sql 注释。它支持 /* ... */ 、 --... 等注释,嵌套注释,忽略引号内的注释

  public static String stripComments(String sqlCommand) {
    StringBuilder result = new StringBuilder();
    //group 1 must be quoted string
    Pattern pattern = Pattern.compile("('(''|[^'])*')|(/\\*(.|[\\r\\n])*?\\*/)|(--(.*|[\\r\\n]))");
    Matcher matcher = pattern.matcher(sqlCommand);
    int prevIndex = 0;
    while(matcher.find()) {
      // add previous portion of string that was not found by regexp - meaning this is not a quoted string and not a comment
      result.append(sqlCommand, prevIndex, matcher.start());
      prevIndex = matcher.end();
      // add the quoted string
      if (matcher.group(1) != null) {
        result.append(sqlCommand, matcher.start(), matcher.end());
      }
    }
    result.append(sqlCommand.substring(prevIndex));
    return result.toString();
  }