Question

我用curl解析一些html代码。某些网站的html源代码如下：

<div id="content">
    some words
</div>
<?    
    $box_social['dimensioni']="80";
        $box_vota=array();
    $box_vota["novideo"]='';
    $box_vota["nofoto"]='';
    $box_vota["id_articolo"]='1003691';
    include($_SERVER['DOCUMENT_ROOT']."/incs/box_social.php");    
?>
<div id="footer">
   some words
</div>

如何从html源删除php短标签？我需要

<div id="content">
    some words
</div>
<div id="footer">
   some words
</div>

我使用preg_replace('/<\?(.*?)\?>/','',$html);，但php短标记部分仍在那里。

Answer 1

此正则表达式符合您的情况：

$html = htmlspecialchars(preg_replace('/<\?([\w\W]*)\?>/','',$html));
$html = htmlspecialchars(preg_replace('/<\?(.*)\?>/s','',$html));

如果存在多个PHP块，这也匹配：

$html = htmlspecialchars(preg_replace('/<\?([^\?>]*)\?>/','',$html));

FROM PHP.NET

s（PCRE_DOTALL）如果设置了此修饰符，则模式中的点元字符将匹配所有字符，包括换行符。没有它，新行就是排除。此修饰符等效于Perl的/ s修饰符。一个负面类如[^ a]总是匹配换行符，独立于此修饰符的设置。

从html源删除php短标签

1 个答案: