Question

有以下一行：

$str = '<div class="hello"> Hello world &lt hello world?! </div>';

需要查找标记内的所有匹配项，同时避免匹配属性值。尝试类似：

$pattern = '/(.*)(hello)(.*)(?=<\/)/ui'; 
$replacement = '$1<span style="background:yellow">$2</span>$3';

但是只有一个＆＃34;你好＆＃34;。怎么办？

Answer 1

（* SKIP）（* F）Perl和PCRE中的语法（PHP，Delphi，R ...）

关于使用正则表达式来解析html的所有免责声明，我们可以通过一个非常简单的正则表达式来实现这一点：

<[^>]*>(*SKIP)(*F)|(hello)

示例PHP代码：

$replaced = preg_replace('~<[^>]*>(*SKIP)(*F)|(hello)~i',
                        '<span style="background:yellow">$1</span>',
                         $yourstring);

在regex demo中，请参阅底部的替换。

<强>解释

此问题是此问题中向"regex-match a pattern, excluding..."

解释的技术的典型案例

交替|的左侧匹配完成<tags>然后故意失败，之后引擎跳到字符串中的下一个位置。右侧捕获hello（对第1组不区分大小写，我们知道它们是正确的，因为它们与左侧的表达式不匹配。

参考

Answer 2

将文本匹配包装到另一个元素是一个非常基本的操作，虽然代码有点棘手：

$html = <<<EOS
<div class="hello"> Hello world &lt; hello world?! </div>
EOS;

$dom = new DOMDocument;
$dom->loadHTML($html);

$search = 'hello';

foreach ($dom->getElementsByTagName('div') as $element) {
    foreach ($element->childNodes as $node) { // iterate all direct descendants
        if ($node->nodeType == 3) { // and look for text nodes in particular
            if (($pos = strpos($node->nodeValue, $search)) !== false) {
                // we split the text up in: <prefix> match <postfix>
                $postfix = substr($node->nodeValue, $pos + strlen($search));
                $node->nodeValue = substr($node->nodeValue, 0, $pos);

                // insert <postfix> behind the current text node
                $textNode = $dom->createTextNode($postfix);
                if ($node->nextSibling) {
                    $node->parentNode->insertBefore($textNode, $node->nextSibling);
                } else {
                    $node->parentNode->appendChild($textNode);
                }

                // wrap match in an element and insert it    
                $wrapNode = $dom->createElement('span', $search);
                $element = $node->parentNode->insertBefore($wrapNode, $textNode);
            }
        }
    }
}

echo $dom->saveHTML(), "\n";

替换标记内的所有匹配项

2 个答案: