Question

我有一个文字，我想突出显示本文中的某些短语。通过将短语设为粗体<b>phrase</b>来完成突出显示。

所以我创建了一个包含必须突出显示的短语的数组。见下文：

$phrases = ['iphone 7 plus', 'iphone 7'];

我创建了一个功能，它会突出显示我的短语。

function highlight_phrases($string, $phrases, $tag = 'strong')
{
    foreach($phrases as $phrase) {
        $string = preg_replace('/' . $phrase . '/i', '<' . $tag . '>$0</' . $tag . '>', $string);
    }    
    return $string;
}

现在我有以下文字：

This is some text about the iPhone 7 and this i really a nice peace of engineering.

这将转变为：

This is some text about the <strong>iPhone 7</strong> and this i really a nice peace of engineering.

好的，一切都很好！

现在我有一个不同的文字：

We are now talking about the iPhone 7 Plus, which is very big!

这就是出了什么问题，它变成了：

We are now talking about the <strong><strong>iPhone 7</strong> Plus</strong>, which is very big!

当这个html打印在屏幕上时，它看起来很好。但由于strong标记内的strong标记，代码本身是错误的。

我如何解决这个问题？

注意： $phrases阵列可能会变得非常大，也许您认识的每部手机都可能会在其中作为短语

Answer 1

您需要动态构建基于交替的正则表达式，同时preg_quote项目（以自动转义所有特殊正则表达式元字符）并按长度按降序排序（否则，较短的子串会阻止更长时间）来自匹配的，与原始代码中的方式相同）。有问题的2个搜索词组的表达式看起来像/iphone 7 plus|iphone 7/i。这应该替换多次匹配相同术语的for循环。

这是sample PHP demo：

function highlight_phrases($string, $phrases, $tag = 'strong')
{
    usort($phrases, function($a,$b){
        return strlen($b)-strlen($a);
    });
    //print_r($phrases); // => Array ( [0] => iphone 7 plus [1] => iphone 7 )
    $pattern = '/' . implode("|", array_map(function ($x) { 
        return preg_quote($x, '/'); 
    }, $phrases)) . '/i';
    // echo "$pattern"; // =>  /iphone 7 plus|iphone 7/i
    return preg_replace($pattern, '<' . $tag . '>$0</' . $tag . '>', $string);
}

$phrases = ['iphone 7', 'iphone 7 plus'];
$s = "This is some text about the iPhone 7 and this i really a nice peace of engineering. We are now talking about the iPhone 7 Plus, which is very big!";
echo highlight_phrases($s, $phrases);
// => This is some text about the <strong>iPhone 7</strong> and this i really a nice peace of engineering. We are now talking about the <strong>iPhone 7 Plus</strong>, which is very big!

关于模式的几个词：在NFA正则表达式中，未锚定的交替组匹配找到匹配的第一个替代分支，这与寻找最长匹配的POSIX不同。这就是为什么较长的搜索短语必须出现在较短的搜索短语之前。请参阅Remember That The Regex Engine Is Eager。

Answer 2

你的函数现在使用preg_replace（）作为无力的als str_replace（）

使用正则表达式的强大功能：

$phrases = ['iphone 7(?: plus)?'];

这将搜索“iphone 7”，可选地后跟“plus” 最后一个问号使（）之间的部分可选。

（和？：使得它在替换函数的第二部分中不显示为$ 2。）

PHP突出显示字符串中的短语

2 个答案: