我正在研究这个PHP函数。这个想法是将字符串中出现的某些单词包装成某些标记(数组中给出的单词和标签)。它工作正常!但是当这些单词出现在链接文本或其'src'属性中时,当然链接被破坏并填充标记,或者生成不应该在链接内的标记。这就是我现在所拥有的:
function replace() {
$terminos = array (
"beneficios" => "h3",
"valoracion" => "h2",
"empresarios" => "h2",
"tecnologias" => "h2",
"...and so on..." => "etc",
);
foreach ($terminos as $key => $value)
{
$body = "string where the word empresarios should be replaced; but the word <a href='http://www.empresarios.com'>empresarios</a> should not be replaced inside <a> tags nor in the URL of their 'src' attribute.";
$tagged = "<".$value.">".$key."</".$value.">";
$result = str_replace($key, $tagged, $body);
}
}
在此示例中,该函数应返回"string where the word <h2>empresarios</h2> should be replaced; but the word <a href='http://www.empresarios.com'>empresarios</a> should not be replaced inside <a> tags nor in the URL of their 'src' attribute."
我希望这个替换函数可以在字符串中完成所有操作,但不能在其属性中使用标记或!
(我想做以下主题中提到的内容,只是它不是我需要的javascript,而是在PHP中:/questions/1666790/how-to-replace-text-not-within-a-specific-tag-in-javascript
)
答案 0 :(得分:3)
使用DOM并仅修改文本节点:
$s = "foo <a href='http://test.com'>foo</a> lorem bar ipsum foo. <a>bar</a> not a test";
echo htmlentities($s) . '<hr>';
$d = new DOMDocument;
$d->loadHTML($s);
$x = new DOMXPath($d);
$t = $x->evaluate("//text()");
$wrap = array(
'foo' => 'h1',
'bar' => 'h2'
);
$preg_find = '/\b(' . implode('|', array_keys($wrap)) . ')\b/';
foreach($t as $textNode) {
if( $textNode->parentNode->tagName == "a" ) {
continue;
}
$sections = preg_split( $preg_find, $textNode->nodeValue, null, PREG_SPLIT_DELIM_CAPTURE);
$parentNode = $textNode->parentNode;
foreach($sections as $section) {
if( !isset($wrap[$section]) ) {
$parentNode->insertBefore( $d->createTextNode($section), $textNode );
continue;
}
$tagName = $wrap[$section];
$parentNode->insertBefore( $d->createElement( $tagName, $section ), $textNode );
}
$parentNode->removeChild( $textNode );
}
echo htmlentities($d->saveHTML());
根据需要编辑用DOMText和DOMElement替换DOMText。
答案 1 :(得分:0)
你指出的答案,在JS中,它基本相同。你只需要指定它是一个字符串。
$regexp = "/(<pre>(?:[^<](?!\/pre))*<\/pre>)|(\:\-\))/gi";
另请注意,您可能需要另一个preg_replace函数来替换单词'empresarios'以防它大写(Empresarios)或类似奇怪的东西(EmPreSAriOS)。
还要处理你的HTML。 <h2>
是块元素,可以这样解释:
字符串所在的单词empresarios 应该被替换;
并替换
字符串
empresarios
应该被替换;
您可能需要使用的是<big>
标记。
答案 2 :(得分:0)
在尝试使用正则表达式模式进行替换时,一定要使用dom解析器隔离合格的文本节点,该正则表达式模式应遵守:单词边界,不区分大小写和unicode字符。如果您打算专门针对具有Unicode字符的单词,则需要在某些字符串函数中添加mb_
。
利用以下见解,我为您的情况量身定制了一个解决方案。
代码:(Demo)
$html = <<<HTML
foo <a href='http://test.com'>fóo</a> lórem
bár ipsum bar food foo bark. <a>bar</a> not á test
HTML;
$lookup = [
'foo' => 'h3',
'bar' => 'h2'
];
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
$regexNeedles = [];
foreach ($lookup as $word => $tagName) {
$regexNeedles[] = preg_quote($word, '~');
}
$pattern = '~\b(' . implode('|', $regexNeedles) . ')\b~iu' ;
foreach($xpath->query('//*[not(self::a)]/text()') as $textNode) {
$newNodes = [];
$hasReplacement = false;
foreach (preg_split($pattern, $textNode->nodeValue, 0, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE) as $fragment) {
$fragmentLower = strtolower($fragment);
if (isset($lookup[$fragmentLower])) {
$hasReplacement = true;
$a = $dom->createElement($lookup[$fragmentLower]);
$a->nodeValue = $fragment;
$newNodes[] = $a;
} else {
$newNodes[] = $dom->createTextNode($fragment);
}
}
if ($hasReplacement) {
$newFragment = $dom->createDocumentFragment();
foreach ($newNodes as $newNode) {
$newFragment->appendChild($newNode);
}
$textNode->parentNode->replaceChild($newFragment, $textNode);
}
}
echo substr(trim(utf8_decode($dom->saveHTML($dom->documentElement))), 3, -4);
输出:
<h3>foo</h3> <a href="http://test.com">fóo</a> lórem
bár ipsum <h2>bar</h2> food <h3>foo</h3> bark. <a>bar</a> not á test