Question

因此，我需要删除类span的{{1}}标记。那就是tip和相应的<span class="tip">，以及其中的所有内容......

我怀疑需要一个正则表达式，但我非常喜欢这个。

...笑

</span>

没有错误......但是

<?php
$string = 'April 15, 2003';
$pattern = '/(\w+) (\d+), (\d+)/i';
$replacement = '${1}1,$3';
echo preg_replace($pattern, $replacement, $string);
?>

给我错误：

<?php
$str = preg_replace('<span class="tip">.+</span>', "", '<span class="rss-title"></span><span class="rss-link">linkylink</span><span class="rss-id"></span><span class="rss-content"></span><span class=\"rss-newpost\"></span>');
echo $str;
?>

之前，错误发生在第二行的Warning: preg_replace() [function.preg-replace]: Unknown modifier '.' in <A FILE> on line 4，但现在......＆gt;。＆gt;

Answer 1

一个简单的正则表达式，如：

<span class="tip">.+</span>

不工作，问题是如果在尖端跨度内打开并关闭了另一个跨度，你的正则表达式将以其结尾而不是尖端结束。基于DOM的工具（如评论中链接的工具）将真正提供更可靠的答案。

根据我在下面的评论，您需要在PHP中使用正则表达式时添加模式分隔符。

<?php
$str = preg_replace('\<span class="tip">.+</span>\', "", '<span class="rss-title"></span><span class="rss-link">linkylink</span><span class="rss-id"></span><span class="rss-content"></span><span class=\"rss-newpost\"></span>');
echo $str;
?>

可能会稍微成功一点。请查看相关功能的文档页面。

Answer 2

这是“适当的”方法（改编自this answer）。

输入：

<?php
$str = '<div>lol wut <span class="tip">remove!</span><span>don\'t remove!</span></div>';
?>

代码：

<?php
function recurse(&$doc, &$parent) {
   if (!$parent->hasChildNodes())
      return;

   for ($i = 0; $i < $parent->childNodes->length; ) {
      $elm = $parent->childNodes->item($i);
      if ($elm->nodeName == "span") {
         $class = $elm->attributes->getNamedItem("class")->nodeValue;
         if (!is_null($class) && $class == "tip") {
            $parent->removeChild($elm);
            continue;
         }
      }

      recurse($doc, $elm);
      $i++;
   }
}

// Load in the DOM (remembering that XML requires one root node)
$doc = new DOMDocument();
$doc->loadXML("<document>" . $str . "</document>");

// Iterate the DOM
recurse($doc, $doc->documentElement);

// Output the result
foreach ($doc->childNodes->item(0)->childNodes as $node) {
   echo $doc->saveXML($node);
}
?>

输出：

<div>lol wut <span>don't remove!</span></div>

Answer 3

现在没有正则表达式，也没有繁重的XML解析：

$html = ' ... <span class="tip"> hello <span id="x"> man </span> </span> ... ';
$tag = '<span class="tip">';
$tag_close = '</span>';
$tag_familly = '<span';

$tag_len = strlen($tag);

$p1 = -1;
$p2 = 0;
while ( ($p2!==false)  && (($p1=strpos($html, $tag, $p1+1))!==false) ) {
  // the tag is found, now we will search for its corresponding closing tag
  $level = 1;
  $p2 = $p1;
  $continue = true; 
  while ($continue) {
     $p2 = strpos($html, $tag_close, $p2+1);
     if ($p2===false) {
       // error in the html contents, the analysis cannot continue
       echo "ERROR in html contents";
       $continue = false;
       $p2 = false; // will stop the loop
     } else {
       $level = $level -1;
       $x = substr($html, $p1+$tag_len, $p2-$p1-$tag_len);
       $n = substr_count($x, $tag_familly);
       if ($level+$n<=0) $continue = false;
     }
  }
  if ($p2!==false) {
    // delete the couple of tags, the farest first
    $html = substr_replace($html, '', $p2, strlen($tag_close));
    $html = substr_replace($html, '', $p1, $tag_len);
  }
}

用PHP中的类剥离标记

3 个答案: