转义匹配引号,但标记的属性除外

时间:2012-02-17 10:38:16

标签: php regex escaping

除了标记属性中的引号之外,我想要转义匹配的引号,例如:

输入:

xyz <test foo='123 abc' bar="def 456"> f00 'escape me' b4r "me too" but not this </tEsT> blah 'escape " me'

预期产出:

xyz <test foo='123 abc' bar="def 456"> f00 \'escape me\' b4r \"me too\" but not this </tEsT> blah \'escape " me\'

我有以下正则表达式:

$result = preg_replace('/(([\'"])((\\\2|.)*?)\2)/', "\\\\$2$3\\\\$2", $input);

返回:

xyz <test foo=\'123 abc\' bar=\"def 456\"> f00 \'escape me\' b4r \"me too\" but not this </tEsT> blah \'escape " me\'

现在我想使用regexp零宽度负面看后面跳过前面有相同符号的匹配引号:

$result = preg_replace('/((?<=[^=])([\'"])((\\\2|.)*?)\2)/', "\\\\$2$3\\\\$2", $input);

但结果仍不如预期:

xyz <test foo='123 abc\' bar="def 456"> f00 \'escape me\' b4r "me too" but not this </tEsT> blah \'escape " me'

你能否告诉我如何跳过不必要的阻止(=“blah blah blah”)而不是仅仅跳过第一个引用?

2 个答案:

答案 0 :(得分:2)

不要向后看以建立背景,而是向前看。这通常要容易得多。

$result = preg_replace('/([\'"])(?![^<>]*>)((?:(?!\1).)*)\1/',
                       '\\\\$1$2\\\\$1',
                        $subject);
(['"])            # capture the open quote
(?![^<>]*>)       # make sure it's not inside a tag
(                 # capture everything up to the next quote
  (?:             # ...after testing each character to
    (?!\1|[<>]).  # ...to be sure it's not the opening quote
  )*              # ...or an angle bracket
)
\1                # match another quote of the same type as the first one

我假设属性值中没有任何尖括号。

答案 1 :(得分:1)

这是另一个。

$str = "xyz <test foo='123 abc' bar=\"def 456\"> f00 'escape me' b4r \"me too\" but not this <br/> <br/></tEsT> blah 'escape \" me'";

$str_escaped = preg_replace_callback('/(?<!\<)[^<>]+(?![^<]*\>)/','escape_quotes',$str);
// check all the strings outside every possible tag
// and replace each by the return value of the function below

function escape_quotes($str) {
    if (is_array($str)) $str = $str[0];
    return preg_replace('/(?<!\\\)(\'|")/','\\\$1',$str);
    // escape all the non-escaped single and double quotes
    // and return the escaped block
}