我想从文件中删除(Java / C / C ++ / ..)多行注释。为此,我写了一个正则表达式:
/\*[^\*]*(\*+[^\*/][^\*]*)*\*+/
此正则表达式可与Nodepad ++和Geany一起使用(搜索并全部替换为空)。 regex在VB.NET中的行为有所不同。
我正在使用:
Microsoft Visual Studio 2010 (Version 10.0.40219.1 SP1Rel)
Microsoft .NET Framework (4.7.02053 SP1Rel)
我正在运行替换文件的文件并不复杂。我不需要打扰可能引起注释开头或结尾的任何引用文本。
@sln感谢您的详细答复,我也将像您一样迅速地解释我的正则表达式!
/\* Find the beginning of the comment.
[^\*]* Match any chars, but not an asterisk.
We need to deal with finding an asterisk now:
(\*+[^\*/][^\*]*)* This regex breaks down to:
\*+ Consume asterisk(s).
[^\*/] Match any other char that is not an asterisk or a / (would end the comment!).
[^\*]* Match any other chars that are not asterisks.
( )* Try to find more asterisks followed by other chars.
\*+/ Match 1 to n asterisks and finish the comment with /.
以下是两个代码段:
第一:
text
/*
* block comment
*
*/ /* comment1 */ /* comment2 */
My text to keep.
/* more comments */
more text
第二:
text
/*
* block comment
*
*/ /* comment1 *//* comment2 */
My text to keep.
/* more comments */
more text
唯一的区别是
/* comment1 *//* comment2 */
使用Notepad ++和Geany删除找到的匹配项在两种情况下都非常适用。对于第二个示例,无法使用VB.NET中的正则表达式。删除后的第二个示例的结果如下:
text
more text
但是它应该看起来像这样:
text
My text to keep.
more text
我正在使用System.Text.RegularExpressions:
Dim content As String = IO.File.ReadAllText(file_path_)
Dim multiline_comment_remover As Regex = New Regex("/\*[^\*]*(\*+[^\*/][^\*]*)*\*+/")
content = multiline_comment_remover.Replace(content, "")
我希望在VB.NET中获得与Notepad ++和Geany相同的结果。正如sln回答的那样,我的正则表达式“应该以一种奇怪的方式工作”。问题是为什么VB.NET无法按预期处理该正则表达式?这个问题仍然悬而未决。
由于sln的答案可以使我的代码正常工作,因此我将接受该答案。尽管这不能解释为什么VB.NET不喜欢我的正则表达式。感谢你的帮助!我学到了很多东西!
答案 0 :(得分:0)
我认为您可以使用通用的C ++注释剥离器。
基本上是
Glbolly在下面找到,替换为$2
演示PCRE:https://regex101.com/r/UldYK5/1
演示Python:https://regex101.com/r/avfSfB/1
# raw: (?m)((?:(?:^[ \t]*)?(?:/\*[^*]*\*+(?:[^/*][^*]*\*+)*/(?:[ \t]*\r?\n(?=[ \t]*(?:\r?\n|/\*|//)))?|//(?:[^\\]|\\(?:\r?\n)?)*?(?:\r?\n(?=[ \t]*(?:\r?\n|/\*|//))|(?=\r?\n))))+)|("(?:\\[\S\s]|[^"\\])*"|'(?:\\[\S\s]|[^'\\])*'|(?:\r?\n|[\S\s])[^/"'\\\s]*)
# delimited: /(?m)((?:(?:^[ \t]*)?(?:\/\*[^*]*\*+(?:[^\/*][^*]*\*+)*\/(?:[ \t]*\r?\n(?=[ \t]*(?:\r?\n|\/\*|\/\/)))?|\/\/(?:[^\\]|\\(?:\r?\n)?)*?(?:\r?\n(?=[ \t]*(?:\r?\n|\/\*|\/\/))|(?=\r?\n))))+)|((?:"[^"\\]*(?:\\[\S\s][^"\\]*)*"|'[^'\\]*(?:\\[\S\s][^'\\]*)*'|(?:\r?\n(?:(?=(?:^[ \t]*)?(?:\/\*|\/\/))|[^\/"'\\\r\n]*))+|[^\/"'\\\r\n]+)+|[\S\s][^\/"'\\\r\n]*)/
(?m) # Multi-line modifier
( # (1 start), Comments
(?:
(?: ^ [ \t]* )? # <- To preserve formatting
(?:
/\* # Start /* .. */ comment
[^*]* \*+
(?: [^/*] [^*]* \*+ )*
/ # End /* .. */ comment
(?: # <- To preserve formatting
[ \t]* \r? \n
(?=
[ \t]*
(?: \r? \n | /\* | // )
)
)?
|
// # Start // comment
(?: # Possible line-continuation
[^\\]
| \\
(?: \r? \n )?
)*?
(?: # End // comment
\r? \n
(?= # <- To preserve formatting
[ \t]*
(?: \r? \n | /\* | // )
)
| (?= \r? \n )
)
)
)+ # Grab multiple comment blocks if need be
) # (1 end)
| ## OR
( # (2 start), Non - comments
# Quotes
# ======================
(?: # Quote and Non-Comment blocks
"
[^"\\]* # Double quoted text
(?: \\ [\S\s] [^"\\]* )*
"
| # --------------
'
[^'\\]* # Single quoted text
(?: \\ [\S\s] [^'\\]* )*
'
| # --------------
(?: # Qualified Linebreak's
\r? \n
(?:
(?= # If comment ahead just stop
(?: ^ [ \t]* )?
(?: /\* | // )
)
| # or,
[^/"'\\\r\n]* # Chars which doesn't start a comment, string, escape,
# or line continuation (escape + newline)
)
)+
| # --------------
[^/"'\\\r\n]+ # Chars which doesn't start a comment, string, escape,
# or line continuation (escape + newline)
)+ # Grab multiple instances
| # or,
# ======================
# Pass through
[\S\s] # Any other char
[^/"'\\\r\n]* # Chars which doesn't start a comment, string, escape,
# or line continuation (escape + newline)
) # (2 end), Non - comments
如果您使用不支持断言的特定引擎,
那么您就必须使用它。
但是,这不会保留格式。
用法与上面相同。
# (/\*[^*]*\*+(?:[^/*][^*]*\*+)*/|//(?:[^\\]|\\\n?)*?\n)|("(?:\\[\S\s]|[^"\\])*"|'(?:\\[\S\s]|[^'\\])*'|[\S\s][^/"'\\]*)
( # (1 start), Comments
/\* # Start /* .. */ comment
[^*]* \*+
(?: [^/*] [^*]* \*+ )*
/ # End /* .. */ comment
|
// # Start // comment
(?: [^\\] | \\ \n? )*? # Possible line-continuation
\n # End // comment
) # (1 end)
|
( # (2 start), Non - comments
"
(?: \\ [\S\s] | [^"\\] )* # Double quoted text
"
| '
(?: \\ [\S\s] | [^'\\] )* # Single quoted text
'
| [\S\s] # Any other char
[^/"'\\]* # Chars which doesn't start a comment, string, escape,
# or line continuation (escape + newline)
) # (2 end)