preg_match()with htmlentities() - 在匹配报价结束的任何报价之前跳过反斜杠 - 不贪心 - 包括换行

时间:2014-01-05 23:17:02

标签: php regex pcre

我尝试做的是preg_replace()任何字符串中htmlentities()转义引号内的任何内容。我不希望它如此贪婪,如果我在String中有多个引号,它将替换整个事物,只是从一个引用样式到它自己,包括相同类型的反斜杠引号。

请专家:

$r = '"first
quote set begin capture for replacement

  \"these escaped quotes should be included for replacement\"

first quote set - end first capture for replacement here"

more stuff - should not be captured
\'second quote set begin capture for replacement

  \\\'these escaped quotes should be included for replacement\\\'

second quote set - end second capture for replacement here\'
`this would also be captured \` `
" this should be separate from first replacement "';
$strA = array('`', "'", '"');
foreach($strA as $v){
  $ste[] = htmlentities($v, ENT_QUOTES, 'UTF-8');
}
$r = preg_replace('/(('.implode('|', $ste).').*(\\\2)*.*\2)/Us', "<span class='sE'>$1</span>", $r);

当然,上述模式不起作用,但显示概念。 $r应该以{{1​​}}标签结尾,如:

<pre>

任何帮助都将不胜感激。

2 个答案:

答案 0 :(得分:0)

您可以使用此(以说明杰克的想法):

$pattern = <<<'LOD'
~
    (['"`])
    (?> [^`"'\\]++ | \\{2} | \\. | (?!\1)["'`] )*
    \1
~xs
LOD;
$result = preg_replace_callback($pattern, function($m) {
    return '<span class="sE">'
         . str_replace(array('"', "'"), array('&quot;', '&#039;'), $m[0])
         . '</span>';
   }, $r);

另一种方法是首先执行引号替换并在之后进行拆分:

$pattern = <<<'LOD'
~
    (&(?>quot|039);|`)
    (?> [^&`\\]++ | \\{2} | \\. | (?!\1)[&`] )*
    \1
~xs
LOD;
$result = preg_replace($pattern,
                  '<span class="sE">$0</span>',
                  str_replace(array('"', "'"), array('&quot;', '&#039;'), $r));

您可以在两个示例中使用htmlentities代替str_replace

答案 1 :(得分:0)

我认为我自己想出来了:

$strA = array('`', "'", '"');
foreach($strA as $v){
  $ste[] = htmlentities($v, ENT_QUOTES, 'UTF-8');
}
$r = preg_replace('/((?<!\\\\)('.implode('|', $ste).').*(?<!\\\\)\2)/Us', "<span class='sE'>$1</span>", $r);

我仍然需要进行一系列测试,但我认为这很有效。