我正在尝试编写一个与JS文件开头的多行注释相匹配的正则表达式(在JavaScript中)。
到目前为止,我想出了这个:/^(\/\*[^\*\/]*\*\/)/g
适用于单行评论:http://refiddle.com/24o
但我的问题是,它不适用于多行注释:http://refiddle.com/24m
你有任何想法如何解决它?
答案 0 :(得分:4)
与HTML一样,JavaScript 无法 由正则表达式解析。试图正确是徒劳的。
相反,您必须使用能够将JavaScript源代码正确转换为AST的解析器,您可以通过编程方式进行检查。幸运的是,有do the parsing for you的库。
Here's a working example that outputs the AST of this code:
/* this is a
multi-line
comment */
var test = "this is a string, /* and this is not a comment! */";
// ..but this is
哪个让我们:
[
"toplevel",
[
[
{
"name": "var",
"start": {
"type": "keyword",
"value": "var",
"line": 5,
"col": 4,
"pos": 57,
"endpos": 60,
"nlb": true,
"comments_before": [
{
"type": "comment2",
"value": " this is a\n multi-line\n comment ",
"line": 1,
"col": 4,
"pos": 5,
"endpos": 47,
"nlb": true
}
]
},
"end": {
"type": "punc",
"value": ";",
"line": 5,
"col": 67,
"pos": 120,
"endpos": 121,
"nlb": false,
"comments_before": []
}
},
[
[
"test",
[
{
"name": "string",
"start": {
"type": "string",
"value": "this is a string, /* and this is not a comment! */",
"line": 5,
"col": 15,
"pos": 68,
"endpos": 120,
"nlb": false,
"comments_before": []
},
"end": {
"type": "string",
"value": "this is a string, /* and this is not a comment! */",
"line": 5,
"col": 15,
"pos": 68,
"endpos": 120,
"nlb": false,
"comments_before": []
}
},
"this is a string, /* and this is not a comment! */"
]
]
]
]
]
]
现在只需循环AST并提取您需要的内容。
答案 1 :(得分:2)
您建议的正则表达式不起作用,因为评论中有*
。此外,它只会查找文件开头的注释。
请尝试使用此代码:
/\/\*[\s\S]*?\*\//
答案 2 :(得分:2)
答案 3 :(得分:1)
答案 4 :(得分:0)
这是一个匹配任何多行或单行注释的内容:
/(\/\*.*?\*\/|\/\/[^\n]+)/
如果你只是想要多线比赛,请放弃下半场:
/\/\*.*?\*\//
对于这两种情况,请确保设置s
标记,以便.
匹配新行。
答案 5 :(得分:0)
我不是javascript专家,但似乎必须考虑C / C ++评论 正确完成意味着必须在流程中考虑引用(逃避和所有这些)。
下面是两个有效的正则表达式方法。正则表达式1直接找到第一个C风格的注释,一旦匹配,就会找到它。正则表达式2是一般情况。它找到C样式,C ++样式或非注释,是全局的,并允许您在找到所需内容时中断。
代码
var js = '\
// /* C++ comment */ \\\n\
/* C++ comment (cont) */ \n\
/* t "h /* is" \n\
is first C-style /* \n\
// comment */ \n\
and /*second C-style*/ \n\
then /*last C-style*/ \n\
';
var cmtrx1 = /^(?:\/\/(?:[^\\]|\\\n?)*?\n|(?:"(?:\\[\S\s]|[^"\\])*"|'(?:\\[\S\s]|[^'\\])*'|[^\/"'\\]*))+(\/\*[^*]*\*+(?:[^\/*][^*]*\*+)*\/)/;
var cmtrx2 = /(\/\*[^*]*\*+(?:[^\/*][^*]*\*+)*\/)|(\/\/(?:[^\\]|\\\n?)*?)\n|(?:"(?:\\[\S\s]|[^"\\])*"|'(?:\\[\S\s]|[^'\\])*'|[\S\s][^\/"'\\]*)/g;
//
print ('Script\n===========\n'+js+'\n===========\n\n');
var match;
//
print ("Using Regex 1\n---------------\n");
if ((match=cmtrx1.exec( js )) != null)
print ("Found C style comment:\n'" + match[1] + "'\n\n");
//
print ("Using Regex 2\n---------------\n");
while ((match=cmtrx2.exec( js )) != null)
{
if (match[1] != undefined)
{
print ("- C style :\n'" + match[1] + "'\n");
// break; // uncomment to stop after first c-style match
}
// comment this to not print it
if (match[2] != undefined)
{
print ("- C++ style :\n'" + match[2] + "'\n");
}
}
输出
Script
===========
// /* C++ comment */ \
/* C++ comment (cont) */
/* t "h /* is"
is first C-style /*
// comment */
and /*second C-style*/
then /*last C-style*/
===========
Using Regex 1
---------------
Found C style comment:
'/* t "h /* is"
is first C-style /*
// comment */'
Using Regex 2
---------------
- C++ style :
'// /* C++ comment */ \
/* C++ comment (cont) */ '
- C style :
'/* t "h /* is"
is first C-style /*
// comment */'
- C style :
'/*second C-style*/'
- C style :
'/*last C-style*/'
扩展正则表达式
Regex 1:
/^(?:\/\/(?:[^\\]|\\\n?)*?\n|(?:"(?:\\[\S\s]|[^"\\])*"|'(?:\\[\S\s]|[^'\\])*'|[^\/"'\\]*))+(\/\*[^*]*\*+(?:[^\/*][^*]*\*+)*\/)/
/^
(?:
\/\/
(?: [^\\] | \\\n? )*?
\n
|
(?:
"
(?: \\[\S\s] | [^"\\] )*
"
| '
(?: \\[\S\s] | [^'\\] )*
'
| [^\/"'\\]*
)
)+
1 (
\/\* [^*]* \*+
(?: [^\/*] [^*]* \*+ )*
\/
1 )
/
Regex 2:
/(\/\*[^*]*\*+(?:[^\/*][^*]*\*+)*\/)|(\/\/(?:[^\\]|\\\n?)*?)\n|(?:"(?:\\[\S\s]|[^"\\])*"|'(?:\\[\S\s]|[^'\\])*'|[\S\s][^\/"'\\]*)/g
/
1 (
\/\* [^*]* \*+
(?: [^\/*] [^*]* \*+ )*
\/
1 )
|
2 (
\/\/
(?: [^\\] | \\\n? )*?
2 )
\n
|
(?:
"
(?: \\[\S\s] | [^"\\] )*
"
| '
(?: \\[\S\s] | [^'\\] )*
'
| [\S\s][^\/"'\\]*
)
/g