Question

我试图从nginx配置文件中读出服务器名称。

我需要在这样的一行上使用正则表达式：

server_name this.com www.this.com someother-example.com;

我正在使用PHP的preg_match_all()，到目前为止我尝试过不同的事情：

/^(?:server_name[\s]*)(?:(.*)(?:\s*))*;$/m
// no output

/^(?:server_name[\s]*)((?:(?:.*)(?:\s*))*);$/m
//  this.com www.this.com someother-example.com

但我找不到合适的人将域列为单独的值。

[  
    0 => 'this.com',  
    1 => 'www.this.com',  
    2 => 'someother-example.com'  
]

Answer 1

鲍勃是你叔叔写的：

(?:server_name|\G(?!^))\s*\K[^;|\s]+

诀窍！

Answer 2

简单的英语要求是提取server_name之后紧跟的空格分隔的字符串，然后提取几个空格。

\G（从上一场比赛的开始开始/从最后一场比赛的结尾继续）和\K（重新开始全串比赛）的动态双人组将成为当日的英雄。

代码：（Demo）

$string = "server_name    this.com www.this.com someother-example.com;";

var_export(preg_match_all('~(?:server_name +|\G(?!^) )\K[^; ]+~', $string, $out) ? $out[0] : 'no matches');

输出：

array (
  0 => 'this.com',
  1 => 'www.this.com',
  2 => 'someother-example.com',
)

模式说明：

(?:                  # start of non-capturing group (to separate piped expressions from end of the pattern)
  server_name +      # literally match "server_name" followed by one or more spaces
  |                  # OR
  \G(?!^)            # continue searching for matches immediately after the previous match, then match a single space
)                    # end of the non-capturing group
\K                   # restart the fullstring match (aka forget any previously matched characters in "this run through")
[^; ]+               # match one or more characters that are NOT a semicolon or a space

之所以看到\G(?!^)而不是\G（根据记录，这在您的示例输入中效果很好）是因为\G可能在两个不同的点匹配以其默认行为。 https://www.regular-expressions.info/continue.html

如果要使用模式的裸\G版并将空白添加到输入字符串的开头，则不会进行预期的匹配。 \G将成功从字符串的开头开始，然后匹配单个空格，然后通过否定的字符类server_name匹配[^; ]。

因此，禁用\G的“在字符串开始处开始”功能会使模式更加稳定/可靠/准确。

preg_match_all()返回一个匹配数组。第一个元素[0]是全字符串匹配项的集合（无论捕获组如何，匹配项都是匹配的）。如果有任何捕获组，它们将从[1]开始，并随每个新组递增。

因为在定位要提取的子字符串之前需要匹配server_name，所以使用捕获组将意味着膨胀的输出数组和无法使用的[0]全字符串匹配子数组。

要提取所需的以空格分隔的子字符串并从结果中省略server_name，\K用于在找到所需的子字符串之前“忘记”匹配的字符。 https://www.regular-expressions.info/keep.html

如果没有\K来清除不需要的前导字符，则输出为：

array (
  0 => 'server_name    this.com',
  1 => ' www.this.com',
  2 => ' someother-example.com',
)

如果有人将我的答案与user3776824或HamZa的答案进行比较：

我选择使用空格字符匹配使之非常文字。 server_name后有4个空格，因此我可以使用精确的量词{4}，但此处选择了一些灵活性。 \s*并不是最理想的选择，因为匹配时总会有“一个或多个空格”来匹配。我对\s没问题，但是要明确一点，它确实匹配空格，制表符，换行符和换行符。
我正在使用(?!^)-前瞻性否定-与(?<!^)-后置否定性，因为它以较少的字符执行相同的工作。您会更经常地从经验丰富的正则表达式工匠那里看到\G(?!^)的使用。
从不需要在字符类中使用“替代”语法（|）来分隔值。 user3776824的模式实际上将除分号和空格之外还排除管道-尽管基于样本数据我不希望对结果产生任何负面影响。根本不应编写模式中的管道。

如何匹配特定子字符串后出现的多个子字符串？

2 个答案: