如何替换HTML中的文本

时间:2009-10-14 11:49:26

标签: php regex

从这个问题:What regex pattern do I need for this?我一直在使用以下代码:

function process($node, $replaceRules) {
    if($node->hasChildNodes()) {
       foreach ($node->childNodes as $childNode) {
         if ($childNode instanceof DOMText) {
           $text = preg_replace(
            array_keys($replaceRules),
            array_values($replaceRules),
            $childNode->wholeText
           );
           $node->replaceChild(new DOMText($text),$childNode);
          } else {
            process($childNode, $replaceRules);
          }
       }
    }
}

$replaceRules = array(
  '/\b(c|C)olor\b/' => '$1olour',
  '/\b(kilom|Kilom|M|m)eter/' => '$1etre',
);

$htmlString = "<p><span style='color:red'>The color of the sky is: gray</p>";
$doc = new DOMDocument();
$doc->loadHtml($htmlString);
process($doc, $replaceRules);
$string = $doc->saveHTML();
echo mb_substr($string,119,-15);

它工作正常,但如果html包含文本和HTML,则失败(因为第一个实例上的子节点被替换)。所以它适用于

<div>The distance is four kilometers</div>

但不是

<div>The distance is four kilometers<br>1000 meters to a kilometer</div>

<div>The distance is four kilometers<div class="guide">1000 meters to a kilometer</div></div>

任何可以对这些示例起作用的方法的想法?

1 个答案:

答案 0 :(得分:2)

调用$node->replaceChild会混淆$node->childNodes迭代器。您可以先获取子节点,然后处理它们:

function process($node, $replaceRules) {
    if($node->hasChildNodes()) {
        $nodes = array();
        foreach ($node->childNodes as $childNode) {
            $nodes[] = $childNode;
        }
        foreach ($nodes as $childNode) {
            if ($childNode instanceof DOMText) {
                $text = preg_replace(
                    array_keys($replaceRules),
                    array_values($replaceRules),
                    $childNode->wholeText);
                $node->replaceChild(new DOMText($text),$childNode);
            }
            else {
                process($childNode, $replaceRules);
            }
        }
    }
}