Question

我对基本的正则表达式很好，但是我在pos / neg的前方/后方有点迷失。

我正试图从中提取id＃：

[keyword stuff=otherstuff id=123 morestuff=stuff]

在之前或之后可能会有无限量的“东西”。我一直在使用The Regex Coach来帮助调试我尝试过的东西，但我不再向前推进......

到目前为止，我有这个：

\[keyword (?:id=([0-9]+))?[^\]]*\]

它会在id之后处理任何额外的属性，但我无法弄清楚如何忽略关键字和id之间的所有内容。我知道我不能去[^id]* 我相信我需要使用像(?!id)*这样的负向前瞻，但我想因为它是零宽度，所以它不会从那里向前移动。这也不起作用：

\[keyword[A-z0-9 =]*(?!id)(?:id=([0-9]+))?[^\]]*\]

我一直在寻找一些例子，但没有找到任何例子。或许我有，但他们走得太远，我甚至都没有意识到他们是什么。

帮助！感谢。

编辑：它必须匹配[keyword stuff = otherstuff]，其中id =根本不存在，所以我必须在id＃group上有1或0。还有其他[otherkeywords id = 32]我不想匹配。该文档需要使用preg_match_all在整个文档中匹配多个[keyword id = 3]。

Answer 1

无需预测/落后：

/\[keyword(?:[^\]]*?\bid=([0-9]+))?[^\]]*?\]/

添加结尾'[^]] *]'以检查真实标签结束，可能是不必要的。

编辑：将\ b添加到ID，否则它可以匹配[keyword you-dont-want-this-guid=123123-132123-123 id=123]

$ php -r 'preg_match_all("/\[keyword(?:[^\]]*?\bid=([0-9]+))?[^\]]*?\]/","[keyword stuff=otherstuff morestuff=stuff]",$matches);var_dump($matches);'
array(2) {
  [0]=>
  array(1) {
    [0]=>
    string(42) "[keyword stuff=otherstuff morestuff=stuff]"
  }
  [1]=>
  array(1) {
    [0]=>
    string(0) ""
  }
}
$ php -r 'var_dump(preg_match_all("/\[keyword(?:[^\]]*?\bid=([0-9]+))?[^\]]*?\]/","[keyword stuff=otherstuff id=123 morestuff=stuff]",$matches),$matches);'
int(1)
array(2) {
  [0]=>
  array(1) {
    [0]=>
    string(49) "[keyword stuff=otherstuff id=123 morestuff=stuff]"
  }
  [1]=>
  array(1) {
    [0]=>
    string(3) "123"
  }
}

Answer 2

你不需要向前/向后看。

由于问题标记为PHP，请使用preg_match_all()并将匹配存储在$ matches中。

以下是：

<?php

  // Store the string. I single quote, in case there are backslashes I
  // didn't see.
$string = 'blah blah[keyword stuff=otherstuff id=123 morestuff=stuff]
           blah blah[otherkeyword stuff=otherstuff id=555 morestuff=stuff]
           blah blah[keyword stuff=otherstuff id=444 morestuff=stuff]';

  // The pattern is '[keyword' followed by not ']' a space and id
  // The space before id is important, so you don't catch 'guid', etc.
  // If '[keyword'  is always at the beginning of a line, you can use
  // '^\[keyword'
$pattern = '/\[keyword[^\]]* id=([0-9]+)/';

  // Find every single $pattern in $string and store it in $matches
preg_match_all($pattern, $string, $matches);

  // The only tricky part you have to know is that each entire match is stored in
  // $matches[0][x], and the part of the match in the parentheses, which is what
  // you want is stored in $matches[1][x]. The brackets are optional, since it's
  // only one line.
foreach($matches[1] as $value)
{     
    echo $value . "<br/>";
}
?>

输出：

123
444

（555被跳过，应该是这样）

<强> PS

如果可以使用标签，您也可以使用\b而不是文字空间。 \b represents a word boundary ...在这种情况下是单词的开头。

$pattern = '/\[keyword[^\]]*\bid=([0-9]+)/';

Answer 3

我认为这就是你所得到的：

\[keyword(?:\s+(?!id\b)[A-Za-z]+=[^\]\s]+)*(?:\s+id=([0-9]+))?[^\]]*\]

（我假设属性名称只能包含ASCII字母，而值可以包含除]之外的任何非空格字符。）

(?:\s+(?!id\b)[A-Za-z]+=[^\]\s]+)*匹配任意数量的attribute=value对（及其前面的空格），只要属性名称不是id即可。 \b（字边界）是为了防止开始与id的属性名称，如idiocy。这次没有必要在属性名称前放置\b ，因为您知道它匹配的任何名称前面都会有空格。但是，正如您所了解的那样，在这种情况下，前瞻性的方法是过度的。

现在，关于这个：

[A-z0-9 =]

A-z是拼写错误或错误。如果你期望它匹配所有大写和小写字母，那么它确实如此。但它也匹配

'[', ']', '^', '_', '`` and '\'

...因为他们的代码点位于大写字母和小写字母之间。 ASCII字母，即。

使用正则表达式跳过所有字符，直到找到使用负向前瞻的特定字母序列

3 个答案: