Question

我有一个包含php代码的字符串，我需要从字符串中删除php代码，例如：

<?php $db1 = new ps_DB() ?><p>Dummy</p>

应该返回<p>Dummy</p>

没有php的字符串例如<p>Dummy</p>应返回相同的字符串。

我知道这可以用正则表达式来完成，但是4小时后我还没有找到解决方案。

Answer 1

 <?php
 function filter_html_tokens($a){
    return is_array($a) && $a[0] == T_INLINE_HTML ?
      $a[1]:
      '';
 }
 $htmlphpstring = '<a>foo</a> something <?php $db1 = new ps_DB() ?><p>Dummy</p>';
 echo implode('',array_map('filter_html_tokens',token_get_all($htmlphpstring)));
 ?>

正如ircmaxell所指出的：这需要有效的PHP！

一个正则表达式的路由是（允许没有带有短标签的'php'。没有结尾？＆gt;在字符串/文件中（由于某种原因Zend推荐这个？）当然还有一个UNgreedy＆amp; DOTALL模式：

preg_replace('/<\\?.*(\\?>|$)/Us', '',$htmlphpstring);

Answer 2

好吧，你可以使用DomDocument来做到这一点......

function stripPHPFromHTML($html) {
    $dom = new DomDocument();
    $dom->loadHtml($html);
    removeProcessingInstructions($dom);
    $simple = simplexml_import_dom($d->getElementsByTagName('body')->item(0));
    return $simple->children()->asXml();
}

function removeProcessingInstructions(DomNode &$node) {
    foreach ($node->childNodes as $child) {
        if ($child instanceof DOMProcessingInstruction) {
            $node->removeChild($child);
        } else {
            removeProcessingInstructions($child);
        }
    }
}

这两个功能将转为

$str = '<?php echo "foo"; ?><b>Bar</b>';
$clean = stripPHPFromHTML($str);
$html = '<b>Bar</b>';

编辑：实际上，在查看了Wrikken的回答之后，我意识到这两种方法都有一个缺点......我需要一些有效的HTML标记（Dom很不错，但它不会解析{{ 1}}）。 Wrikken需要有效的PHP（任何语法错误都会失败）。所以也许是两者的组合（首先尝试一下。如果失败了，试试另一个。如果两者都失败了，那么如果不弄清楚他们失败的确切原因，你真的没什么可做的）......

Answer 3

一个简单的解决方案是使用php标签分解到数组中以删除其间的任何内容并将其内部追回到字符串。

function strip_php($str) {

  $newstr = '';

  //split on opening tag
  $parts = explode('<?',$str);

  if(!empty($parts)) {
      foreach($parts as $part) {

          //split on closing tag
          $partlings =  explode('?>',$part);
          if(!empty($partlings)) {

              //remove content before closing tag
              $partlings[0] = '';
          }

          //append to string
          $newstr .= implode('',$partlings);
      }
  }
  return $newstr;
}

这比正则表达式慢，但不需要有效的html或php;它只需要关闭所有的php标签。

对于并不总是包含最终结束标记的文件和一般错误检查，您可以对标记进行计数，如果缺少则附加结束标记，或者如果开始和结束标记没有按预期添加，则通知它，例如在函数的开头添加下面的代码。这会减慢它的速度，但是：）

  $tag_diff = (substr_count($str,'<?') - (substr_count($str,'?>');

  //Append if there's one less closing tag
  if($tag_diff == 1) $str .= '?>';

  //Parse error if the tags don't add up
  if($tag_diff < 0 || $tag_diff > 1) die('Error: Tag mismatch. 
  (Opening minus closing tags = '.$tag_diff.')<br><br>
  Dumping content:<br><hr><br>'.htmlentities($str));

Answer 4

如果您使用的是PHP，则只需使用正则表达式替换与PHP代码匹配的任何内容。

以下语句将删除PHP标记：

preg_replace('/^<\?php.*\?\>/', '', '<?php $db1 = new ps_DB() ?><p>Dummy</p>');

如果找不到任何匹配项，则不会替换任何内容。

Answer 5

这是@jon建议的strip_php的增强版本，它可以用另一个字符串替换php代码的一部分：

/**
 * Remove PHP code part from a string.
 *
 * @param   string  $str            String to clean
 * @param   string  $replacewith    String to use as replacement
 * @return  string                  Result string without php code
 */
function dolStripPhpCode($str, $replacewith='')
{
    $newstr = '';

    //split on each opening tag
    $parts = explode('<?php',$str);
    if (!empty($parts))
    {
        $i=0;
        foreach($parts as $part)
        {
            if ($i == 0)    // The first part is never php code
            {
                $i++;
                $newstr .= $part;
                continue;
            }
            //split on closing tag
            $partlings = explode('?>', $part);
            if (!empty($partlings))
            {
                //remove content before closing tag
                if (count($partlings) > 1) $partlings[0] = '';
                //append to out string
                $newstr .= $replacewith.implode('',$partlings);
            }
        }
    }
    return $newstr;
}

如何从字符串中删除PHP代码？

5 个答案: