Question

我有一个正则表达式如下：

(\/\*([^*]|[\r\n]|(\*+([^*\/]|[\r\n])))*\*+\/)|(\/\/.*)

我的测试字符串如下：

<?
/* This is a comment */

cout << "Hello World"; // prints Hello World

/*
 * C++ comments can  also
 */

cout << "Hello World"; 

/* Comment out printing of Hello World:

cout << "Hello World"; // prints Hello World

*/

echo "//This line was not a Comment, but ... ";
echo "http://stackoverflow.com";
echo 'http://stackoverflow.com/you can not match this line';
array = ['//', 'no, you can not match this line!!']
/* This is * //a comment */

https://regex101.com/r/lx2f5F/1

它可以正确匹配第2,4,7~9,13~17行。

但它也匹配最后一行中的单引号（'），双引号（“）和数组，如何非贪婪匹配？

感谢任何帮助。

Answer 1

我相信我有一个新的最佳模式。
/\/\*[\s\S]*?\*\/|(['"])[\s\S]+?\1(*SKIP)(*FAIL)|\/{2}.*/
这将准确处理以下 683步中的文本块：

<?
/* This is a comment */

cout << "Hello World"; // prints Hello World

/*
 * C++ comments can  also
 */

cout << "Hello World"; 

/* Comment out printing of Hello World:

cout << "Hello World"; // prints Hello World

*/

echo "//This line was not a Comment, but ... ";
echo "http://stackoverflow.com";
echo 'http://stackoverflow.com/you can not match this line';
array = ['//', 'no, you can not match this line!!']
/* This is * //a comment */

模式说明：（Demo *您可以使用底部的替换框用空字符串替换注释子字符串 - 有效删除所有注释。）

/\/\*[\s\S]*?\*\/ 匹配\*然后是0个或更多字符，然后是*/
| 或
(['"])[\s\S]*?\1(*SKIP)(*FAIL) 不匹配'或"然后是1个或多个字符，然后是前导（已捕获）字符
| 或
\/{2}.*/ 匹配//，然后是零个或多个非换行符

使用[\s\S]与.类似，但它允许换行符，这是前两个备选方案中故意使用的。第三种方案有意使用.在找到换行符时停止。

我已经检查了每个替代品序列，以确保最快的替代品排在第一位并且模式得到优化。我的模式正确匹配OP的样本输入。如果有人发现我的模式有问题，请留下评论，以便我可以尝试修复它。

Jan的模式正确匹配所有OP在 1006步骤中所需的子串：~([\'\"])(?<!\\).*?\1(*SKIP)(*FAIL)|(?|(?P<comment>(?s)\/\*.*?\*\/(?-s))|(?P<comment>\/\/.+))~gx

Sahil的模式与 UPDATED 示例输入中的最终评论完全匹配。这意味着问题是错误的并且应该被关闭为“不清楚你在问什么”，或者Sahil的回答是错误的，它不应该被授予绿色标记。当您更新问题时，您应该要求Sahil更新他的答案。当不正确的答案无法满足这个问题时，未来的SO读者可能会变得混淆，SO变得不那么可靠。

Answer 2

使用PCRE，您可以使用(*SKIP)(*FAIL)机制：

([\'\"])(?<!\\).*?\1(*SKIP)(*FAIL)
|
(?|
    (?P<comment>(?s)/\*.*?\*/(?-s))
    |
    (?P<comment>//.+)
)

请参阅a working demo on regex101.com 注意：此处并不真正需要分支重置(?|...)，但仅用于清除名为comment的组。

用于排除评论的正则表达式？

2 个答案: