Question

我在网站上搜索一些数据库搜索结果＆amp;尝试突出显示与搜索词匹配的返回结果中的术语。以下是我到目前为止（在PHP中）：

$highlight = trim($highlight);
if(preg_match('|\b(' . $highlight . ')\b|i', $str_content))
{
    $str_content = preg_replace('|\b(' . $highlight. ')(?!["\'])|i', "<span class=\"highlight\">$1</span>", 
    $str_break;
}

走这条路的缺点是，如果我的搜索词也出现在url永久链接中，返回的结果会将span插入href属性并破坏anchor标记。在我的正则表达式中是否存在从开头和结束HTML标记之间出现的搜索结果中排除“任何”信息？

我知道我可以使用strip_tags（）函数并以纯文本形式吐出结果，但如果我没有这样做，我宁愿不这样做。

Answer 1

不要尝试使用正则表达式解析HTML：
RegEx match open tags except XHTML self-contained tags

尝试使用PHP Simple HTML DOM。

<?php
// get DOM
$html = file_get_html('http://www.google.com/search?q=hello+kitty');

// ensure this is properly sanitized.
$term = trim($term);

// highlight $term in all <div class="result">...</div> elements
foreach($html->find('div.result') as $e){
   echo str_replace($term, '<span class="highlight">'.$term.'</span>', $e->plaintext);
}
?>

注意：这不是完全解决方案，因为我不知道你的HTML是什么样的，但这应该让你非常接近正常。< / p>

Answer 2

我认为断言是你所寻找的。

Answer 3

我最终选择了这条路线，到目前为止，这条路线适用于这种特殊情况。

<?php

if(preg_match('|\b(' . $term . ')\b|i', $str_content))
{
    $str_content = strip_tags($str_content);
    $str_content = preg_replace('|\b(' . $term . ')(?!["\'])|i', "<span class=\"highlight\">$1</span>", $str_content);
    $str_content = preg_replace('|\n[^<]+|', '</p><p>', $str_content);
    break;
}

?>

它仍然是html编码的，但现在没有html标签就更容易解析

使用正则表达式突破锚定标记，突出显示PHP中的搜索词

3 个答案: