Question

我正在尝试做脚本女巫正在寻找文本中的关键字并使用锚标记（链接）进行更改。现在，我想仅在段落（<p>）中更改文本，因此H标记与文本保持一致。因此，我需要在preg_replace模式中写入仅在段落中替换文本。

<?php

$keywords = array(
        'keywords' => 'www.1.com',
        'hello' => 'www.2.com',
        'there' => 'www.3.com',
        'are' => 'www.4.com',

    );

$sentence = '
            <h1>Hello</h1>
            <h2>Hello there blablabla</h2>
            <p>Hello, there are keywords</p>
            <p>Hello, there are keywords</p>
            <p>Hello, there are keywords</p>
            <p>Hello, there are keywords</p>
            <p>Hello, there are keywords</p>
';

foreach ($keywords as $word => $link){
                $sentence = preg_replace('@(?<=\W|^)('.$word.')(?=\W|$)@i', '<a href="'.$link.'">$1</a>', $sentence, 1);
        }


 echo $sentence;

?>

Answer 1

这可能接近您所寻找的内容：

<?php
$keywords = array(
    'keywords' => 'www.1.com',
    'hello' => 'www.2.com',
    'there' => 'www.3.com',
    'are' => 'www.4.com',
);

$sentence = <<<TEXT
<h1>Hello, here we have a keyword!</h1>
<h2>Hello there is some sub title</h2>
<p>Hello, there are some hidden keywords in here</p>
<p>Hello, here are some additional keywords</p>
<p>Hello, there we have more keywords</p>
<p>Hello, and keywords</p>
<p>Hello, but no more hello wherever you look...</p>
<p>Hello, there is also a <a href="...">link</a> in here!</p>

TEXT;

foreach ($keywords as $word => $link) {
    $pattern = '|^(<p[^>]*>.*)(' . preg_quote($word, '|') . ')(.*</p>)$|mui';
    $replace = '$1<a href="' . $link . '">$2</a>$3';
    $sentence = preg_replace($pattern, $replace, $sentence);
}
echo $sentence;

上述代码的输出为：

<h1>Hello, here we have a keyword!</h1>
<h2>Hello there is some sub title</h2>
<p><a href="www.2.com">Hello</a>, <a href="www.3.com">there</a> <a href="www.4.com">are</a> some hidden <a href="www.1.com">keywords</a> in here</p>
<p><a href="www.2.com">Hello</a>, here <a href="www.4.com">are</a> some additional <a href="www.1.com">keywords</a></p>
<p><a href="www.2.com">Hello</a>, <a href="www.3.com">there</a> we have more <a href="www.1.com">keywords</a></p>
<p><a href="www.2.com">Hello</a>, and <a href="www.1.com">keywords</a></p>
<p>Hello, but no more <a href="www.2.com">hello</a> wherever you look...</p>
<p><a href="www.2.com">Hello</a>, <a href="www.3.com">there</a> is also a <a href="...">link</a> in here!</p>

然而，您将永远无法获得仅基于正则表达式的方法的绝对稳健解决方案。您应该考虑使用DOM解析器来处理HTML标记的复杂性。然后在解析的元素内部，您可以应用模式替换。

Answer 2

您一定不要养成尝试使用正则表达式解析有效 html 的习惯。使用合法的 dom 解析库解析 html 更可靠。我喜欢 DOMDocument。如果您的实际输入字符串没有用于发布元素的父/包含元素，则您需要通过将输入字符串包装在 <div></div> 标记中来稳定 DOM 结构，然后处理完成后去除包装。
使用 getElementsByTagName() 隔离文档中的 <p> 标签，然后循环遍历这个精炼的集合。
使用 preg_replace_callback() 根据您的查找数组执行替换。

代码：(Demo)

$keywords = [
    'keywords' => 'www.1.com',
    'hello' => 'www.2.com',
    'there' => 'www.3.com',
    'are' => 'www.4.com',
];

$sentence = '
<div>
    <h1>Hello</h1>
    <h2>Hello there blablabla</h2>
    <p>Hello, there are keywords</p>
    <p>Hello, there are keywords</p>
    <p>Hello, there are keywords</p>
    <p>Hello, there are keywords</p>
    <p>Hello, there are keywords</p>
</div>';

$dom = new DOMDocument;
$dom->loadHTML($sentence, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
foreach ($dom->getElementsByTagName('p') as $p) {
    $p->nodeValue = preg_replace_callback(
        '~\b(?:' . implode('|', array_keys($keywords)) . ')\b~i',
        function($m) use ($keywords) {
            return $keywords[strtolower($m[0])];
        },
        $p->nodeValue
    );
}
echo $dom->saveHTML();

输出：

<div>
    <h1>Hello</h1>
    <h2>Hello there blablabla</h2>
    <p>www.2.com, www.3.com www.4.com www.1.com</p>
    <p>www.2.com, www.3.com www.4.com www.1.com</p>
    <p>www.2.com, www.3.com www.4.com www.1.com</p>
    <p>www.2.com, www.3.com www.4.com www.1.com</p>
    <p>www.2.com, www.3.com www.4.com www.1.com</p>
</div>

使用preg_replace

2 个答案: