Question

我的内容有foo == 'bar test baz'和test.asd = "buz foo"。我需要匹配“标识符”，左侧的那些不在双/单引号内。这就是我现在所拥有的：

preg_replace_callback('#([a-zA-Z\\.]+)#', function($matches) {
    var_dump($matches);
}, $subject);

现在它甚至匹配字符串中的那些。我怎么写一个与字符串不匹配的？

另一个例子：foo == 5 AND bar != 'buz' OR fuz == 'foo bar fuz luz'。所以实质上，匹配不在字符串内的a-zA-Z。

Answer 1

/^[^'"=]*/

可以用于您的示例。它匹配任何数量的字符（从字符串的开头开始）既不是引号也不是等号。

/^[^'"=\s]*/

另外避免匹配空格，这可能是您需要的，也可能不是。

修改

你问的是如何匹配文本中任何地方引用部分之外的字母（可能还有点？）。这更复杂。一个正则表达式可以正确识别它当前是否在引用字符串之外（通过确保引号的数量，不包括转义引号和嵌套引号，是偶数）看起来像这样的PHP正则表达式：

'/(?: (?= # Assert even number of (relevant) single quotes, looking ahead: (?: (?:\\\\.|"(?:\\\\.|[^"\\\\])*"|[^\\\\\'"])* \' (?:\\\\.|"(?:\\\\.|[^"\'\\\\])*"|[^\\\\\'])* \' )* (?:\\\\.|"(?:\\\\.|[^"\\\\])*"|[^\\\\\'])* $ ) (?= # Assert even number of (relevant) double quotes, looking ahead: (?: (?:\\\\.|\'(?:\\\\.|[^\'\\\\])*\'|[^\\\\\'"])* " (?:\\\\.|\'(?:\\\\.|[^\'"\\\\])*\'|[^\\\\"])* " )* (?:\\\\.|\'(?:\\\\.|[^\'\\\\])*\'|[^\\\\"])* $ ) ([A-Za-z.]+) # Match ASCII letters/dots )+/x'

可以找到here的解释。但可能正则表达式不是正确的工具。

Answer 2

你也可以试试这个：

preg_match_all('/[\w.]+(?=(?:[^\'"]|[\'"][^\'"]*["\'])*$)/', $subject, $result, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($result[0]); $i++) {
    # Matched text = $result[0][$i];
}

匹配引号外的所有字母，数字和_以及点。您可以通过将允许的字符添加到[\w.]

来扩展它们

Answer 3

我在这里使用的技巧是强制正则表达式在遇到引号时分支，然后我们忽略这个分支。

$subject = <<<END
foo == 'bar test baz' and test.asd = "buz foo"
foo == 5 AND bar != 'buz' OR fuz == 'foo bar fuz luz'
END;

$regexp = '/(?:["\'][^"\']+["\']|([a-zA-Z\\.]+\b))/';

preg_replace_callback($regexp, function($matches) {;
    if( count($matches) >= 2 ) {
        print trim($matches[1]).' ';
    }
}, $subject);

// Output: 'foo and test.asd foo AND bar OR fuz '

正则表达式的主要部分是

(?: anything between quotes | any word consisting of a-zA-Z )

PHP正则表达式匹配不在字符串中的文本

3 个答案: