如果替换文本出现在标题标记内,则preg_replace但不匹配

时间:2010-11-05 01:24:11

标签: php regex preg-replace

下面的函数将对找到的关键字的前两次出现的内容(即html标记)进行替换,包括粗体和em标记。

我需要考虑的一个案例是,如果关键字已经在h1标签内,我不希望回调发生。

示例:

< h1>这是标题标记< / h1>

内的关键字

更换后

< h1>这是< b>关键字< / b>在标题标记内< / h1>

我如何更改替换项以便它跳过标题标记(h1-h6)中出现的关键字并继续下一个匹配?

function doReplace($matches)
{
    static $count = 0;
    switch($count++) {
        case 0: return ' <b>'.trim($matches[1]).'</b>';
        case 1: return ' <em>'.trim($matches[1]).'</em>';
        default: return $matches[1];
            }
    }

function save_content($content){
    $mykeyword = "test";
    if ((strpos($content,"<b>".$mykeyword) > -1 || 
    strpos($content,"<strong>".$mykeyword) > -1) && 
    strpos($content,"<em>".$mykeyword) > -1 ) 
    {
        return $content;
    }
    else
    {
        $theContent = preg_replace_callback("/\b(?<!>)($mykeyword)\b/i","doReplace", $content);
        return $theContent;
    }
}

1 个答案:

答案 0 :(得分:4)

不要将正则表达式用于HTML / XML:

$d = new DOMDocument();
$d->loadHTML($your_html);
$x = new DOMXpath($d);
foreach($x->query("//text()[
   contains(.,'keyword')
   and not(ancestor::h1) 
   and not(ancestor::h2) 
   and not(ancestor::h3) 
   and not(ancestor::h4) 
   and not(ancestor::h5) 
   and not(ancestor::h6)]") as $node){
    //do with the node as you like
}