正则表达式匹配从字符串的第一个大写到句子的结尾以突出显示单词的数组

时间:2011-12-20 23:27:55

标签: php regex preg-replace preg-match-all

我知道标题很难理解。 基本上我有一个文字可以说大约20000个字符。

当我执行搜索时,我想提取找到任何匹配单词的句子并突出显示它们。

我得到了一个突出显示名为$ words的单词数组,并调用主文本$ text。 所以我的代码如下:

foreach($words as $word):

    $regex = '/[^.!?\n]*\b'.preg_quote($word,"/").'\b[^.!?\n]*/i';

    preg_match_all($regex, $text, $matches);  
    count($matches[0]) > 3 ? $search_q= 3 : $search_q=count($matches[0]);

    for ($i=0; $i < $search_q; $i++):
        echo preg_replace('/\b('.preg_quote($word,"/").')\b/i','<span class="highlighted">$1</span>',$matches[0][$i]).'[..]  ';
    endfor;
endforeach;

此代码的问题是当2个单词属于同一个句子时,则该句子将被打印两次。我想打印一次,两个单词都突出显示,但我不知道如何做到这一点。

感谢帮助人员

更新:测试场景

让我们说:

$text="A new holiday shopping tradition: Smartphones and social networks

Many consumers will take out their phones before their wallets this holiday season with even more visiting social media sites before tackling their gift lists.

More than one-quarter (27 percent) of smartphone owners plan to use their devices for holiday shopping to search for store locations (67 percent), compare prices (59 percent) and check product availability (46 percent).  Additionally, 44 percent say they plan to use social media to seek discounts, read reviews and check family and friends’ gift lists.

“Consumers are using online and mobile platforms to make the most of their holiday budgets, and the survey indicates that they will do more than just compare prices,” said Paul.  “Retailers that use mobile and online channels to show product availability, locations and pricing but add customized promotions and gift ideas may encourage shoppers to come in the door for a specific gift and take additional items to the register.”";

这些话是:

$words=array('social','media');

用我的代码我得到了这个:

A new holiday shopping tradition: Smartphones and **social** networks[..]
Many consumers will take out their phones before their wallets this holiday season with even more visiting **social** media sites before tackling their gift lists[..]
Additionally, 44 percent say they plan to use **social** media to seek discounts, read reviews and check family and friends’ gift lists[..]
Many consumers will take out their phones before their wallets this holiday season with even more visiting social **media** sites before tackling their gift lists[..]
Additionally, 44 percent say they plan to use social **media** to seek discounts, read reviews and check family and friends’ gift lists[..]

相反,我想:

A new holiday shopping tradition: Smartphones and **social** networks[..]
Many consumers will take out their phones before their wallets this holiday season with even more visiting **social** **media** sites before tackling their gift lists[..]
Additionally, 44 percent say they plan to use **social** **media** to seek discounts, read reviews and check family and friends’ gift lists[..]

使用fge代码我得到:

social[..] 
social[..] 
social[..] 
media[..] 
media[..] 

我希望通过实例很容易理解。非常感谢

2 个答案:

答案 0 :(得分:1)

如果将文本分成一系列句子并依次检查每个句子,你的头部可能会受到的伤害更少。如果单词列表不太长,您可以将整个列表放入正则表达式中。类似的东西:

/\b(\Qword1\E|\Qword2\E|\Qword3\E)\b/

答案 1 :(得分:0)

首先,我不明白为什么你使用这么复杂的正则表达式:你确实使用了单词锚,那么为什么还要使用补充的字符类呢?

其次,此解决方案假定单词不包含特殊的正则表达式字符...

以下是您可以做的事情:

$w = preg_quote($word, "/");
$fullword = '\b\Q' . $w . '\E\b';
$regex = '/' . $fullword . '(?!.*' . $fullword . ')/i';

说明:\Q表示所有字符,直到\E都应按字面处理(这意味着如果单词包含点,则您是安全的)。所以,你匹配你的单词(它是锚定的),然后你说你不应该再次匹配单词(?!.*\b\Qwordhere\E\b)

这意味着如果一个句子多次包含该单词,它将只匹配最后一次出现!

最后,要突出显示,请使用:

preg_replace('/(' . $fullword . ')/ig', '<span class="highlighted">$1</span>', $text);