我正在尝试做脚本女巫正在寻找文本中的关键字并使用锚标记(链接)进行更改。
现在,
我想仅在段落(<p>
)中更改文本,因此H标记与文本保持一致。
因此,我需要在preg_replace
模式中写入仅在段落中替换文本。
<?php
$keywords = array(
'keywords' => 'www.1.com',
'hello' => 'www.2.com',
'there' => 'www.3.com',
'are' => 'www.4.com',
);
$sentence = '
<h1>Hello</h1>
<h2>Hello there blablabla</h2>
<p>Hello, there are keywords</p>
<p>Hello, there are keywords</p>
<p>Hello, there are keywords</p>
<p>Hello, there are keywords</p>
<p>Hello, there are keywords</p>
';
foreach ($keywords as $word => $link){
$sentence = preg_replace('@(?<=\W|^)('.$word.')(?=\W|$)@i', '<a href="'.$link.'">$1</a>', $sentence, 1);
}
echo $sentence;
?>
答案 0 :(得分:0)
这可能接近您所寻找的内容:
<?php
$keywords = array(
'keywords' => 'www.1.com',
'hello' => 'www.2.com',
'there' => 'www.3.com',
'are' => 'www.4.com',
);
$sentence = <<<TEXT
<h1>Hello, here we have a keyword!</h1>
<h2>Hello there is some sub title</h2>
<p>Hello, there are some hidden keywords in here</p>
<p>Hello, here are some additional keywords</p>
<p>Hello, there we have more keywords</p>
<p>Hello, and keywords</p>
<p>Hello, but no more hello wherever you look...</p>
<p>Hello, there is also a <a href="...">link</a> in here!</p>
TEXT;
foreach ($keywords as $word => $link) {
$pattern = '|^(<p[^>]*>.*)(' . preg_quote($word, '|') . ')(.*</p>)$|mui';
$replace = '$1<a href="' . $link . '">$2</a>$3';
$sentence = preg_replace($pattern, $replace, $sentence);
}
echo $sentence;
上述代码的输出为:
<h1>Hello, here we have a keyword!</h1>
<h2>Hello there is some sub title</h2>
<p><a href="www.2.com">Hello</a>, <a href="www.3.com">there</a> <a href="www.4.com">are</a> some hidden <a href="www.1.com">keywords</a> in here</p>
<p><a href="www.2.com">Hello</a>, here <a href="www.4.com">are</a> some additional <a href="www.1.com">keywords</a></p>
<p><a href="www.2.com">Hello</a>, <a href="www.3.com">there</a> we have more <a href="www.1.com">keywords</a></p>
<p><a href="www.2.com">Hello</a>, and <a href="www.1.com">keywords</a></p>
<p>Hello, but no more <a href="www.2.com">hello</a> wherever you look...</p>
<p><a href="www.2.com">Hello</a>, <a href="www.3.com">there</a> is also a <a href="...">link</a> in here!</p>
然而,您将永远无法获得仅基于正则表达式的方法的绝对稳健解决方案。您应该考虑使用DOM解析器来处理HTML标记的复杂性。然后在解析的元素内部,您可以应用模式替换。
答案 1 :(得分:0)
您一定不要养成尝试使用正则表达式解析有效 html 的习惯。使用合法的 dom 解析库解析 html 更可靠。我喜欢 DOMDocument。如果您的实际输入字符串没有用于发布元素的父/包含元素,则您需要通过将输入字符串包装在 <div></div>
标记中来稳定 DOM 结构,然后处理完成后去除包装。
使用 getElementsByTagName()
隔离文档中的 <p>
标签,然后循环遍历这个精炼的集合。
使用 preg_replace_callback()
根据您的查找数组执行替换。
代码:(Demo)
$keywords = [
'keywords' => 'www.1.com',
'hello' => 'www.2.com',
'there' => 'www.3.com',
'are' => 'www.4.com',
];
$sentence = '
<div>
<h1>Hello</h1>
<h2>Hello there blablabla</h2>
<p>Hello, there are keywords</p>
<p>Hello, there are keywords</p>
<p>Hello, there are keywords</p>
<p>Hello, there are keywords</p>
<p>Hello, there are keywords</p>
</div>';
$dom = new DOMDocument;
$dom->loadHTML($sentence, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
foreach ($dom->getElementsByTagName('p') as $p) {
$p->nodeValue = preg_replace_callback(
'~\b(?:' . implode('|', array_keys($keywords)) . ')\b~i',
function($m) use ($keywords) {
return $keywords[strtolower($m[0])];
},
$p->nodeValue
);
}
echo $dom->saveHTML();
输出:
<div>
<h1>Hello</h1>
<h2>Hello there blablabla</h2>
<p>www.2.com, www.3.com www.4.com www.1.com</p>
<p>www.2.com, www.3.com www.4.com www.1.com</p>
<p>www.2.com, www.3.com www.4.com www.1.com</p>
<p>www.2.com, www.3.com www.4.com www.1.com</p>
<p>www.2.com, www.3.com www.4.com www.1.com</p>
</div>