击穿

Question

我需要使用单个正则表达式可靠地删除所有JavaScript注释。

我搜索过StackOverflow和其他网站，但没有考虑交替引号，多行注释，字符串中的注释，正则表达式等。

是否有任何正则表达式可以从中删除注释：

var test = [
    "// Code",
    '// Code',
    "'// Code",
    '"// Code',
    //" Comment",
    //' Comment',
    /* Comment */
    // Comment /* Comment
    /* Comment
     Comment // */ "Code",
    "Code",
    "/* Code */",
    "/* Code",
    "Code */",
    '/* Code */',
    '/* Code',
    'Code */',
    /* Comment
    "Comment",
    Comment */ "Code",
    /Code\/*/,
    "Code */"
]

这里有一个jsbin或jsfiddle来测试它。

Answer 1

我喜欢挑战：）

这是我的工作解决方案：

/((["'])(?:\\[\s\S]|.)*?\2|\/(?![*\/])(?:\\.|\[(?:\\.|.)\]|.)*?\/)|\/\/.*?$|\/\*[\s\S]*?\*\//gm

将其替换为$1。

在这里小提琴：http://jsfiddle.net/LucasTrz/DtGq8/6/

当然，正如已经无数次指出的那样，一个合适的解析器可能会更好，但仍然......

注意：我在正则表达式字符串的小提琴中使用了正则表达式字面值，过多的转义会破坏你的大脑。

击穿

((["'])(?:\\[\s\S]|.)*?\2|\/(?![*\/])(?:\\.|\[(?:\\.|.)\]|.)*?\/) <-- the part to keep
|\/\/.*?$                                                         <-- line comments
|\/\*[\s\S]*?\*\/                                                 <-- inline comments

要保留的部分

(["'])(?:\\[\s\S]|.)*?\2                   <-- strings
\/(?![*\/])(?:\\.|\[(?:\\.|.)\]|.)*?\/     <-- regex literals

字符串

    ["']              match a quote and capture it
    (?:\\[\s\S]|.)*?  match escaped characters or unescpaed characters, don't capture
    \2                match the same type of quote as the one that opened the string

正则表达式文字

    \/                          match a forward slash
    (?![*\/])                   ... not followed by a * or / (that would start a comment)
    (?:\\.|\[(?:\\.|.)\]|.)*?   match any sequence of escaped/unescaped text, or a regex character class
    \/                          ... until the closing slash

要删除的部分

|\/\/.*?$              <-- line comments
|\/\*[\s\S]*?\*\/      <-- inline comments

行注释

    \/\/         match two forward slashes
    .*?$         then everything until the end of the line

内联评论

    \/\*         match /*
    [\s\S]*?     then as few as possible of anything, see note below
    \*\/         match */

我不得不使用[\s\S]代替.，因为遗憾的是JavaScript不支持正则表达式s修饰符（单行 - 这一行也允许.匹配换行符）

此正则表达式适用于以下极端情况：

字符类中包含/的正则表达式模式：/[/]/
转义字符串文字中的换行符

最后的老板战斗

只是为了它的有趣 ......这是令人眼花缭乱的硬核版本：

/((["'])(?:\\[\s\S]|.)*?\2|(?:[^\w\s]|^)\s*\/(?![*\/])(?:\\.|\[(?:\\.|.)\]|.)*?\/(?=[gmiy]{0,4}\s*(?![*\/])(?:\W|$)))|\/\/.*?$|\/\*[\s\S]*?\*\//gm

这会增加以下扭曲边缘情况（fiddle，regex101）：

Code = /* Comment */ /Code regex/g  ; // Comment
Code = Code / Code /* Comment */ /g  ; // Comment    
Code = /Code regex/g /* Comment */  ; // Comment

这是高度启发式代码，您可能不应该使用它（甚至比之前的正则表达式更少）并且让这个边缘情况受到打击。

Answer 2

首先，我建议使用适当的JavaScript解析器来完成此操作。查看此前的问答：JavaScript parser in JavaScript

对于您提供¹的输入，以下是可能有效的解决方案：

匹配模式：

/("(?:[^\r\n\\"]|\\.)*"|'(?:[^\r\n\\']|\\.)*'|\/[^*\/]([^\\\/]|\\.)*\/[gm]*)|\/\/[^\r\n]*|\/\*[\s\S]*?\*\//g

以下是模式的细分：

/
  (                                     # start match group 1
      "(?:[^\r\n\\"]|\\.)*"             #   match a double quoted string
    | '(?:[^\r\n\\']|\\.)*'             #   match a single quoted string
    | \/[^*\/]([^\\\/]|\\.)*\/[gm]*     #   match a regex literal
  )                                     # end match group 1
  | \/\/[^\r\n]*                        # match a single line break
  | \/\*[\s\S]*?\*\/                    # match a multi-line break
/g

并将其替换为$1（匹配组1）。这里的诀窍是除了注释之外的任何东西都在第1组中匹配，它再次被自己替换，但注释被替换为空字符串。

这是一个regexr演示，显示以下替换：

  var test = [
      "// Code",
      '// Code',
      "'// Code",
      '"// Code',




       "Code",
      "Code",
      "/* Code */",
      "/* Code",
      "Code */",
      '/* Code */',
      '/* Code',
      'Code */',
       "Code",
      /Code\/*/,
      "Code */"
  ]

¹同样，解析器是要走的路，因为正则表达式文字可能与除法运算符混淆。如果您的来源中有var x = a / b / g;之类的作业，则上述解决方案将会中断！

Answer 3

我建议您使用自己的JavaScript解析器来解析JavaScript，然后利用解析器API去除您不想要的内容。我没有亲自这样做过，但正则表达式应该仅限于常规内容，我怀疑JS会陷入其中。

这里有一些好看的地方。

JavaScript parser in JavaScript

Answer 4

是否有可以删除评论的正则表达式

没有。你不能构建一个与注释匹配的正则表达式（这样你就可以简单地用空字符串替换匹配），因为没有lookbehind，就不可能确定//"是注释还是字符串文字的结尾。

你可以使用正则表达式作为标记化器（你“只需”处理字符串文字，正则表达式文字和两种类型的注释），但我建议使用一个完整的JavaScript解析器，他们免费提供。

Answer 5

test.replace（/（/ *（[\ s \ S] ？）* /）|（//(.）$）/ gm，＆＃39;＆＃39; ）;

全面的RegExp删除JavaScript注释

5 个答案:

击穿

要保留的部分

字符串

正则表达式文字

要删除的部分

行注释

内联评论

最后的老板战斗