可变长度的正则表达式

时间:2013-06-25 14:50:14

标签: php regex search preg-replace lookbehind

我的正则表达式如下:

(?<![\s]*?(\"|&quot;)")WORD(?![\s]*?(\"|&quot;))

如您所见,我试图匹配WORD的所有实例,除非它们在“引号”内。所以......

WORD <- Find this
"WORD" <- Don't find this
"   WORD   " <- Also don't find this, even though not touching against marks
&quot;WORD&quot;  <- Dont find this (I check &quot; and " so works after htmlspecialchars)

如果我没有收到错误,我相信我的正则表达式会完美运行:

Compilation failed: lookbehind assertion is not fixed length

考虑到外观的限制,有没有办法做我想做的事?

如果您能想到其他任何方式让我知道。

非常感谢,

马修

P.S。 WORD部分实际上包含Jon Grubers URL检测器

2 个答案:

答案 0 :(得分:3)

我建议采用不同的方法。只要引号被正确平衡,这将起作用,因为你知道你在引用的字符串中iff后面的引号数是奇数,从而使lookbehind部分不必要:

if (preg_match(
'/WORD             # Match WORD
(?!                # unless it\'s possible to match the following here:
 (?:               # a string of characters
  (?!&quot;)       # that contains neither &quot;
  [^"]             # nor "
 )*                # (any length),
 ("|&quot;)        # followed by either " or &quot; (remember which in \1)
 (?:               # Then match
  (?:(?!\1).)*\1   # any string except our quote char(s), followed by that quote char(s)
  (?:(?!\1).)*\1   # twice,
 )*                # repeated any number of times --> even number
 (?:(?!\1).)*      # followed only by strings that don\'t contain our quote char(s)
 $                 # until the end of the string
)                  # End of lookahead/sx', 
$subject))

答案 1 :(得分:1)

我建议删除引用的字符串,然后搜索剩下的字符串。

$noSubs = preg_replace('/(["\']|&quot;)(\\\\\1|(?!\1).)*\1/', '', $target);
$n = preg_match_all('/\bWORD\b/', $noSubs, $matches);

我上面用以替换引用字符串的正则表达式将&quote;"'视为单独的字符串分隔符。对于任何给定的分隔符,你的正则表达式看起来更像是这样:

/"(\\"|[^"])*"/

因此,如果您想将&quot;视为等同于"

/("|&quot;)(\\("|&quot;)|(?!&quot;)[^"])*("|&quot;)/i

如果你想要处理单引号字符串(假设没有带撇号的单词):

/("|&quot;)(\\("|&quot;)|(?!&quot;)[^"])*("|&quot;)|'(\\'|[^'])*'/i

将这些文件转换为PHP字符串时要小心。

修改

Qtax提到您可能正在尝试替换匹配的WORD数据。在这种情况下,您可以使用像这样的正则表达式轻松地对字符串进行标记:

/("|&quot;)(\\("|&quot;)|(?!&quot;)[^"])*("|&quot;)|((?!"|&quot;).)+/i

进入带引号的字符串和不带引号的段,然后构建一个新的字符串,只对未加引号的部分进行替换操作:

$tokenizer = '/("|&quot;)(\\\\("|&quot;)|(?!&quot;)[^"])*("|&quot;)|((?!"|&quot;).)+/i';
$hasQuote = '/"|&quot;/i';
$word = '/\bWORD\b/';
$replacement = 'REPLACEMENT';
$n = preg_match_all($tokenizer, $target, $matches, PREG_SET_ORDER);
$newStr = '';
if ($n === false) {
    /* Print error Message */
    die();
}
foreach($matches as $match){
    if(preg_match($hasQuote, $match[0])){
        //If it has a quote, it's a quoted string.
        $newStr .= $match[0];
    } else {
        //Otherwise, run the replace.
        $newStr .= preg_replace($word, $replacement, $match[0]);
    }
}

//Now $newStr has your replaced String.  Return it from your function, or print it to
//your page.