示例代码：

Question

我正在尝试使用正则表达式删除文件中的php代码。一些PHP没有格式良好，因此可能有额外的空格和/或换行符。举个例子：

<?php require_once('some_sort_of_file.php'); 
                               ?>

我想出了以下似乎有效的正则表达式：

$initial_text  = preg_replace('/\s+/', ' ', $initial_text );  
$initial_text = preg_replace('/' . preg_quote('<?php') . '.*?' . preg_quote('?>') . '/', '', $initial_text);

但是想知道是否有办法只使用1个正则表达式语句，以加快速度。

谢谢！

Answer 1

更好的方法：使用the built-in tokenizer。 Regexes have problems with parsing irregular languages like PHP。另一方面，tokenizer就像PHP本身一样解析PHP代码。

示例代码：

// some dummy code to play with
$myhtml = '<html>
    <body>foo bar
    <?php echo "hello world"; ?>
    baz
    </body>
    </html>';

// Our own little function to do the heavy lifting
function strip_php($text) {
    // break the code into tokens
    $tokens = token_get_all($text);
    // loop over the tokens
    foreach($tokens as $index => $token) {
        // If the token is not an array (e.g., ';') or if it is not inline HTML, nuke it.
        if(!is_array($token) || token_name($token[0]) !== 'T_INLINE_HTML') {
            unset($tokens[$index]);
        }
        else { // otherwise, echo it or do whatever you want here
            echo $token[1];
        }
    }
}

strip_php($myhtml);

输出：

<html>
<body>foo bar
baz
</body>
</html>

DEMO

Answer 2

您可以使用s修饰符将其设置为单个正则表达式，这将允许点匹配换行符。我也添加了i修饰符以使其不区分大小写.. dunno如果您关心它：

$initial_text = preg_replace('~<\?php.*?\?>~si', '', $initial_text );

一个正则表达式而不是两个？

2 个答案:

示例代码：

输出：