Question

我需要从以下文字中选出3组：

[startA]
this is the first group
 [startB]
 blabla
[end]
[end]
[startA]
this is the second group
 [startB]
 blabla
[end]
[end]
[startA]
this is the second group
 [startB]
 blabla
[end]
[end]

正如您所看到的，每个小组都以[startA]开头，以[end]结尾，应该很容易制作与此匹配的正则表达式。
但问题是在组内部，字符串[end]被使用了任意次数正则表达式应与以[startA]开头且以[end]开头的组匹配，直到下一个[startA]，而不是之前的[end]。

我认为这应该是前瞻性的，但到目前为止我的尝试都没有是否可以使用正则表达式执行此操作？

Answer 1

您应该使用递归正则表达式

preg_match_all('/\[(?!end)[^[\]]+\](?:[^[\]]*|[^[\]]*(?R)[^[\]]*)\[end\]\s*/', $s, $m);

请参阅 this demo 。

Answer 2

是的，你确实可以通过前瞻来解决这个问题：

$test_string = <<<TEST
[startA]
this is the first group
 [startB]
 blabla
[end]
[end]
[startA]
this is the second group
 [startB]
 blabla
[end]
[end]
[startA]
this is the third group
 [startB]
 blabla
[end]
[end]
TEST;
preg_match_all('#\[startA](.+?)\[end]\s*(?=\[startA]|$)#s', 
    $test_string, $matches);
var_dump($matches[1]);

这是ideone demo。

关键是在前瞻子模式中使用交替，以测试下一个[startA]部分或字符串的结尾（$）。

注意/s修饰符：没有它.元字符将不匹配结束符（“\ n”）。

正则表达式：选择多个组的问题

2 个答案: