从这个问题:What regex pattern do I need for this?我一直在使用以下代码:
function process($node, $replaceRules) {
if($node->hasChildNodes()) {
foreach ($node->childNodes as $childNode) {
if ($childNode instanceof DOMText) {
$text = preg_replace(
array_keys($replaceRules),
array_values($replaceRules),
$childNode->wholeText
);
$node->replaceChild(new DOMText($text),$childNode);
} else {
process($childNode, $replaceRules);
}
}
}
}
$replaceRules = array(
'/\b(c|C)olor\b/' => '$1olour',
'/\b(kilom|Kilom|M|m)eter/' => '$1etre',
);
$htmlString = "<p><span style='color:red'>The color of the sky is: gray</p>";
$doc = new DOMDocument();
$doc->loadHtml($htmlString);
process($doc, $replaceRules);
$string = $doc->saveHTML();
echo mb_substr($string,119,-15);
它工作正常,但如果html包含文本和HTML,则失败(因为第一个实例上的子节点被替换)。所以它适用于
<div>The distance is four kilometers</div>
但不是
<div>The distance is four kilometers<br>1000 meters to a kilometer</div>
或
<div>The distance is four kilometers<div class="guide">1000 meters to a kilometer</div></div>
任何可以对这些示例起作用的方法的想法?
答案 0 :(得分:2)
调用$node->replaceChild
会混淆$node->childNodes
迭代器。您可以先获取子节点,然后处理它们:
function process($node, $replaceRules) {
if($node->hasChildNodes()) {
$nodes = array();
foreach ($node->childNodes as $childNode) {
$nodes[] = $childNode;
}
foreach ($nodes as $childNode) {
if ($childNode instanceof DOMText) {
$text = preg_replace(
array_keys($replaceRules),
array_values($replaceRules),
$childNode->wholeText);
$node->replaceChild(new DOMText($text),$childNode);
}
else {
process($childNode, $replaceRules);
}
}
}
}