我需要匹配并替换一些评论。 例如:
$test = "the url is http://www.google.com";// comment "<-- that quote needs to be matched
我希望匹配引号之外的注释,并将注释中的"
替换为"
。
我尝试了很多模式和不同的运行方式,但没有运气。
正则表达式将使用javascript运行以匹配php“//”评论
更新: 我从下面的borkweb拿了正则表达式并修改了它。使用了来自http://ejohn.org/blog/search-and-dont-replace/的函数并提出了这个:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title></title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<script type="text/javascript">
function t_replace(data){
var q = {}, ret = "";
data.replace(/(?:((["'\/]*(("[^"]*")|('[^']*'))?[\s]*)?[\/\/|#][^"|^']*))/g, function(value){
q[key] = value;
});
for ( var key in q ){
ret = q[key];
}
var text = data.split(ret);
var out = ret + text[1];
out = out.replace(/"/g,""");
out = out.replace(/'/g,"'");
return text[0] + out;
}
</script>
</head>
<body>
<script type="text/javascript">
document.write(t_replace("$test = \"the url is http://www.google.com\";// c'o\"mment \"\"\"<-- that quote needs to be matched")+"<br>");
document.write(t_replace("$test = 'the url is http://www.google.com';# c'o\"mment \"\"\"<-- that quote needs to be matched"));
</script>
</body>
</html>
它处理单引号或双引号之外的所有行注释。无论如何我可以优化这个功能吗?
更新2: 它不处理这个字符串
document.write(t_replace("$test //= \"the url is http://www.google.com\"; //c'o\"mment \"\"\"<-- that quote needs to be matched")+"<br>");
答案 0 :(得分:12)
您可以使用正则表达式同时匹配所有字符串和注释。如果它是一个字符串,您可以将其替换为自身,不更改,然后处理特殊情况以进行注释。
我想出了这个正则表达式:
"(\\[\s\S]|[^"])*"|'(\\[\s\S]|[^'])*'|(\/\/.*|\/\*[\s\S]*?\*\/)
共有3个部分:
"(\\[\s\S]|[^"])*"
用于匹配双引号字符串。'(\\[\s\S]|[^'])*'
用于匹配单引号字符串。(\/\/.*|\/\*[\s\S]*?\*\/)
用于匹配单行注释和多行注释。替换函数检查匹配的字符串是否为注释。如果不是,请不要更换。如果是,请替换"
和'
。
function t_replace(data){
var re = /"(\\[\s\S]|[^"])*"|'(\\[\s\S]|[^'])*'|(\/\/.*|\/\*[\s\S]*?\*\/)/g;
return data.replace(re, function(all, strDouble, strSingle, comment) {
if (comment) {
return all.replace(/"/g, '"').replace(/'/g, ''');
}
return all;
});
}
试运行:
Input: $test = "the url is http://www.google.com";// c'o"mment """<-- that quote needs to be matched
Output: $test = "the url is http://www.google.com";// c'o"mment """<-- that quote needs to be matched
Input: $test = 'the url is http://www.google.com';# c'o"mment """<-- that quote needs to be matched
Output: $test = 'the url is http://www.google.com';# c'o"mment """<-- that quote needs to be matched
Input: $test //= "the url is http://www.google.com"; //c'o"mment """<-- that quote needs to be matched
Output: $test //= "the url is http://www.google.com"; //c'o"mment """<-- that quote needs to be matched
答案 1 :(得分:2)
我不得不承认,这个正则表达式花了我一段时间才产生......但我很确定这会做你想要的:
<script>
var str = "$test = \"the url is http://www.google.com\";// comment \"\"\"<-- that quote needs to be matched";
var reg = /^(?:(([^"'\/]*(("[^"]*")|('[^']*'))?[\s]*)?\/\/[^"]*))"/g;
while( str !== (str = str.replace( reg, "$1"") ) );
console.log( str );
</script>
这是正则表达式中发生的事情:
^ # start with the beginning of the line
(?: # don't capture the following
(
([^"'\/]* # start the line with any character as long as it isn't a string or a comment
(
("[^"]*") # grab a double quoted string
| # OR
('[^']*') # grab a single quoted string
)? # but...we don't HAVE to match a string
[\s]* # allow for any amount of whitespace
)? # but...we don't HAVE to have any characters before the comment begins
\/\/ # match the start of a comment
[^"]* # match any number of characters that isn't a double quote
) # end un-caught grouping
) # end the non-capturing declaration
" # match your commented double quote
javascript中的while循环只是查找/替换,直到找不到给定行中的任何其他匹配项。
答案 2 :(得分:1)
不要忘记,PHP评论也可以采用/* this is a comment */
的形式,可以跨越多行。
您可能会对此网站感兴趣:
http://blog.stevenlevithan.com/archives/mimic-lookbehind-javascript
Javascript在其正则表达式引擎中没有本机lookbehind支持。您可以做的是从一行的末尾开始,向后看以捕获半冒号后面的任何字符+可选的空格+ //所以类似于:
;\w*\/\/(.+)$
这可能无法捕捉到一切。
您也可能想要查找Javascript(或其他语言)PHP语法检查器。我认为Komodo Edit的PHP语法检查器可能是用Javascript编写的。如果是这样,它可能会让您深入了解如何删除所有内容,但是语法检查器需要确保PHP代码有效,注释等等。语法颜色变换器也是如此。以下是另外两个链接:
http://ecoder.quintalinda.com/
http://www.webdesignbooth.com/9-useful-javascript-syntax-highlighting-scripts/
答案 3 :(得分:0)
在@Thai答案的补充中,我发现非常好,我想补充一点:
在此示例中,使用原始正则表达式仅匹配引号的最后一个字符:https://regex101.com/r/CoxFvJ/2
所以我修改了一下以允许捕获完整的引号内容,并提供更健谈和通用的内容示例:https://regex101.com/r/CoxFvJ/3
所以最终的正则表达式:
/"((?:\\"|[^"])*)"|'((?:\\'|[^'])*)'|(\/\/.*|\/\*[\s\S]*?\*\/)/g
非常感谢泰国解锁我。