使用domdocument和preg_replace_callback在html中设置标签

时间:2019-07-06 21:56:26

标签: php domdocument preg-replace-callback

我尝试用(html)锚替换我的术语词典中的单词,以便获得工具提示。我完成了替换零件,但无法在DomDocument对象中找回它。

我已经创建了一个递归函数,该函数迭代DOM,迭代每个子节点,在字典中搜索单词并将其替换为锚点。

我一直在将它与HTML上的普通preg_match一起使用,但是这会遇到问题..当HTML变得复杂

递归函数:

$terms = array(
   'example'=>'explanation about example'
);

function iterate_html($doc, $original_doc = null)
    {
    global $terms;

        if(is_null($original_doc)) {
            self::iterate_html($doc, $doc);
        }

        foreach($doc->childNodes as $childnode)
        {
            $children = $childnode->childNodes;
            if($children) {
                self::iterate_html($childnode);
            } else {

                $regexes = '~\b' . implode('\b|\b',array_keys($terms)) . '\b~i';
                $new_nodevalue = preg_replace_callback($regexes, function($matches) {
                    $doc = new DOMDocument();

                    $anchor = $doc->createElement('a', $matches[0]);
                    $anchor->setAttribute('class', 'text-info');
                    $anchor->setAttribute('data-toggle', 'tooltip');
                    $anchor->setAttribute('data-original-title', $terms[strtolower($matches[0])]);

                    return $doc->saveXML($anchor);

                }, $childnode->nodeValue);



                $dom = new DOMDocument();
                $template = $dom->createDocumentFragment();
                $template->appendXML($new_nodevalue);

                $original_doc->importNode($template->childNodes, true);
                $childnode->parentNode->replaceChild($template, $childnode);
            }
        }
    }

echo iterate_html('this is just some example text.');

我希望结果是:

this is just some <a class="text-info" data-toggle="tooltip" data-original-title="explanation about example">example</a> text

1 个答案:

答案 0 :(得分:0)

当您使用XPath查询时,我认为构建一个递归函数来遍历DOM并不是有用的。另外,我不确定preg_replace_callback是否适合这种情况。我更喜欢使用preg_split。这是一个示例:

$html = 'this is just some example text.';

$terms = array(
   'example'=>'explanation about example'
);

// sort by reverse order of key size
// (to be sure that the longest string always wins instead of the first in the pattern)

uksort($terms, function ($a, $b) {
    $diff = mb_strlen($b) - mb_strlen($a);

    return ($diff) ? $diff : strcmp($a, $b);
});

// build the pattern inside a capture group (to have delimiters in the results with the PREG_SPLIT_DELIM_CAPTURE option)
$pattern = '~\b(' . implode('|', array_map(function($i) { return preg_quote($i, '~'); }, array_keys($terms))) . ')\b~i';

// prevent eventual html errors to be displayed
$libxmlInternalErrors = libxml_use_internal_errors(true);

// determine if the html string have a root html element already, if not add a fake root.
$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$fakeRootElement = false;

if ( $dom->documentElement->nodeName !== 'html' ) {
    $dom->loadHTML("<div>$html</div>", LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED);
    $fakeRootElement = true;
}

libxml_use_internal_errors($libxmlInternalErrors);

// find all text nodes (not already included in a link or between other unwanted tags)
$xp = new DOMXPath($dom);
$textNodes = $xp->query('//text()[not(ancestor::a)][not(ancestor::style)][not(ancestor::script)]');

// replacement
foreach ($textNodes as $textNode) {
    $parts = preg_split($pattern, $textNode->nodeValue, -1, PREG_SPLIT_DELIM_CAPTURE);
    $fragment = $dom->createDocumentFragment();
    foreach ($parts as $k=>$part) {
        if ($k&1) {
            $anchor = $dom->createElement('a', $part);
            $anchor->setAttribute('class', 'text-info');
            $anchor->setAttribute('data-toggle', 'tooltip');
            $anchor->setAttribute('data-original-title', $terms[strtolower($part)]);
            $fragment->appendChild($anchor);
        } else {
            $fragment->appendChild($dom->createTextNode($part));
        }
    }
    $textNode->parentNode->replaceChild($fragment, $textNode);
}


// building of the result string
$result = '';

if ( $fakeRootElement ) {
    foreach ($dom->documentElement->childNodes as $childNode) {
        $result .= $dom->saveHTML($childNode);
    }
} else {
    $result = $dom->saveHTML();
}

echo $result;

demo

可以随意将其放入一个或多个函数/方法中,但是请记住,这种编辑具有不可忽略的权重,每次编辑html时都应使用(而不是每次显示html时都应使用) )。