如何在正则表达式中添加限制

时间:2019-04-16 17:09:22

标签: php regex preg-replace

我有一个Regex函数,允许我在出现X时替换文本中的单词。 我尝试添加条件,如果单词在标签<h1>,<h2>,<h3>和图像alt信标中,请勿替换。有人可以帮我编辑该功能以添加此条件。

public function str_ireplace_n($search, $replace, $subject, $occurrence)
{
    $search = preg_quote($search);
    return preg_replace("/^((?:(?:.*?$search){" . --$occurrence . "}.*?))$search/i", "$1$replace", $subject);
}

示例:

$text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. <h1>Lorem ipsum dolor sit</h1> Proin libero erat, malesuada eget volutpat vitae, efficitur vitae ipsum. Vivamus et <h2>Lorem ipsum dolor sit</h2> justo non quam laoreet euismod. Ut eget dapibus ligula. <img src="url" alt="Lorem ipsum dolor sit"/> Vestibulum vestibulum."

// I replace the second Lorem in this text by a link
$text = $this->str_ireplace_n('Lorem', ' <a href="' . $domain . '" alt="">Lorem</a> ', $text, 2); //2 for the second occurence

//The result will add a link on the Lorem inside the <h1> and I want to avoid this.
//I want the Regex do nothing in the case where the keyword is in h1 h2 or alt of image

我没有选择要替换的“ Lorem”,随机发生。当出现在<h1> / <h2>或图像alt上时,我必须确保不执行任何操作。

先感谢

1 个答案:

答案 0 :(得分:1)

我个人会首先使用preg_split之类的东西:

$string = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit. <h1>Lorem ipsum dolor sit</h1> Proin libero erat, malesuada eget volutpat vitae, efficitur vitae ipsum. Vivamus et <h2>Lorem ipsum dolor sit</h2> justo non quam laoreet euismod. Ut eget dapibus ligula. <img src="url" alt="Lorem ipsum dolor sit"/> Vestibulum vestibulum.';

$split = preg_split('/(<[^\/]+(?:\/|<\/[^>]+)>)/', $string, null, PREG_SPLIT_DELIM_CAPTURE);

这是给你的(这是我们需要做的基本事情):

Array
(
    [0] => Lorem ipsum dolor sit amet, consectetur adipiscing elit. 
    [1] => <h1>Lorem ipsum dolor sit</h1>
    [2] =>  Proin libero erat, malesuada eget volutpat vitae, efficitur vitae ipsum. Vivamus et 
    [3] => <h2>Lorem ipsum dolor sit</h2>
    [4] =>  justo non quam laoreet euismod. Ut eget dapibus ligula. 
    [5] => <img src="url" alt="Lorem ipsum dolor sit"/>
    [6] =>  Vestibulum vestibulum.
)

现在,我们将这些项目隔离在标签内。因此,现在我们可以遍历此集合,并检查前导字符是否为<,并了解其位于标签内还是标签外。只要您的标签以</...>/>结尾,此方法就应该起作用。

基本上,HTML标记+内容成为分隔符,我们也将其捕获。

关键是简单的Regex不能解析HTML,因为它不是常规语言。因此,我们必须使用PHP进行一些工作才能将它们捆绑在一起。就像我在这里所做的那样,我们可以使用简单的Regex分解并简化问题。

$subject = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit. <h1>Lorem ipsum dolor sit</h1> Proin libero erat, malesuada eget volutpat vitae, efficitur vitae ipsum. Vivamus et <h2>Lorem ipsum dolor sit</h2> Lorem justo non quam laoreet euismod. Ut eget dapibus ligula. <img src="url" alt="Lorem ipsum dolor sit"/> Vestibulum vestibulum.';

//word to replace
$search = 'Lorem';
//stuff to replace with
$replace = '<a href="Lorem">foo</a>';
 //what match to replace
$occurrence = 2;

function str_ireplace_n($search, $replace, $subject, $occurrence){
    $search = preg_quote($search);

    //separate the HTML from the "body" text
    $split = preg_split('/(<(?:h1|h2|h3|img)[^\/]+(?:\/|<\/[^>]+)>)/', $subject, null, PREG_SPLIT_DELIM_CAPTURE);
    //the number of current matches
    $match = 0;

    foreach($split as &$s){
        //if strpos < is 0 it's the first character - meaning its part of HTML (we don't want that)
        //if it matches search 
        if(0 !== strpos($s,'<') && preg_match('/\b'.$search.'\b/i', $s)){
            //increment the match counter
            ++$match;
             //replace the match if it's the nth one
            if($match == $occurrence)  $s = preg_replace('/\b'.$search.'\b/i',$replace,$s);
        }
    }

    return implode($split);
}

echo str_ireplace_n($search, $replace, $subject, $occurrence);

输出:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. <h1>Lorem ipsum dolor sit</h1> 
 Proin libero erat, malesuada eget volutpat vitae, efficitur vitae ipsum. Vivamus et 
  <h2>Lorem ipsum dolor sit</h2> <a href="Lorem">foo</a> justo non quam laoreet euismod. 
  Ut eget dapibus ligula. <img src="url" alt="Lorem ipsum dolor sit"/> Vestibulum vestibulum.

这是替换的部分<a href="Lorem">foo</a>

我添加了几行返回以提高可读性(在输出中),并添加了另一个“ Lorem”(在输入中),因为HTML标记之外没有第二个匹配。在任何情况下,如果您注意到,HTML标记内的任何内容都不会被修改。在这种情况下,只有第二场比赛有所改变。

并不是100%确切地知道您需要什么(这是这类问题的常见情况),所以我尝试解释如何做而不是仅仅做。

Sandbox