如何裁剪除HTML标签之外的搜索词周围的文本

时间:2015-02-27 10:06:16

标签: php html

我所拥有的是一个包含HTML和Text的字符串,也是一个搜索词。 我想得到一些裁剪的文字"周围" $ searchword。

示例文字:

Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. <sometag>At</sometag> vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.

如果$ searchword是&#34; vero&#34;

,则输出
...sed diam voluptua. At <strong>vero</strong> eos et accusam et...

所以我想要在不包括HTML的搜索词之前和之后使用X字符。 我不知道如何开始。我知道我们需要一个substr函数和一个正则表达式,但我被卡住了。

3 个答案:

答案 0 :(得分:2)

// The string to search in
$text = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis ex.';

// The text to search
$search_query = 'consectetur';

// The regular expression
// Note that I’m using preg_quote() to make sure the text doesn’t conflict with the regular expression
// This expression matches 3 words (punctuation included) before and after the searched keyword
$search = '/((\w+[^\w]+){3})(' . preg_quote($search_query) . ')(([^\w]+\w+){3})/i';

// Find all matches of the expression, and store it in $matches
preg_match($search, $text, $matches);

// Use the results to generate the string you desire.
$result = sprintf('...%s<strong>%s</strong>%s...', $matches[1], $matches[3], $matches[4]);

答案 1 :(得分:1)

蒂姆的解决方案运行正常,但这里有一个稍微不同的解决方案,匹配前面的m个字符和给定单词后面的n个字符而不是n个单词:

$string = "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. <sometag>At</sometag> vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.";
$string = strip_tags($string); // strip html tags
$word = 'vero';
$replace = "<strong>$word</strong>";
$before = 22; // characters to match before word
$after = 7; // characters to match after word

preg_match('/(.){'.$before.'}'.$word.'(.){'.$after.'}/', $string, $matches);

echo '...' . preg_replace('/'.$word.'/', '<strong>'.$word.'</strong>', $matches[0]) . '...';
// returns "...sed diam voluptua. At <strong>vero</strong> eos et..." for $before = 22 and $after = 7

答案 2 :(得分:0)

第1步:删除HTML标记。第2步:包装搜索词出现次数。

$text = 'Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. <sometag>At</sometag> vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.';

$plainText = strip_tags($text);

$resultText = str_replace($searchword, '<strong>' . $searchword . '</strong>', $plainText);