在较长的文本中查找与php / regex匹配的括号

时间:2012-02-03 09:44:41

标签: php regex

我试图将这个问题包围了很长一段时间,但仍未找到解决方案。

我正在研究一些简单的格式化方法,其中我想要一些包含括号内字符串的标签,并在括号前定义标签。标签也应该在其他括号内。

字符串:

This is some random text, tag1{while this is inside a tag2{tag}}. This is some
other text tag2{also with a tag  tag3{inside} of it}.

我现在要做的是每个

的内容
tag1{}
tag2{}
tag3{}

我发现其他人有类似的问题(Find matching brackets using regular expression),但他们的问题更倾向于如何在其他括号内找到匹配的括号,而我的问题是这两个,并在较长的文本中找到多个括号

4 个答案:

答案 0 :(得分:3)

如果标签始终是平衡的,您可以使用这样的表达式来获取所有标签的内容和名称,包括嵌套标签。

\b(\w+)(?={((?:[^{}]+|{(?2)})*)})

Example

$str = "This is some random text, tag1{while this is inside a tag2{tag}}. This is some other text tag2{also with a tag  tag3{inside} of it}.";

$re = "/\\b(\\w+)(?={((?:[^{}]+|{(?2)})*)})/";
preg_match_all($re, $str, $m);

echo "* Tag names:\n";
print_r($m[1]);
echo "* Tag content:\n";
print_r($m[2]);

输出:

* Tag names:
Array
(
    [0] => tag1
    [1] => tag2
    [2] => tag2
    [3] => tag3
)
* Tag content:
Array
(
    [0] => while this is inside a tag2{tag}
    [1] => tag
    [2] => also with a tag  tag3{inside} of it
    [3] => inside
)

答案 1 :(得分:2)

我不知道,如果有一个正则表达式,它会在一次调用中获取所有内部和外部标记,但是您可以从链接的问题中使用此正则表达式/\{(([^\{\}]+)|(?R))*\}/并递归迭代到结果中

我将标记名称和一些命名子图案添加到正则表达式中以便更清晰:

function search_tags($string, $recursion = 0) {
    $Results = array();
    if (preg_match_all("/(?<tagname>[\w]+)\{(?<content>(([^\{\}]+)|(?R))*)\}/", $string, $matches, PREG_SET_ORDER)) {
        foreach ($matches as $match) {
            $Results[] = array('match' => $match[0], 'tagname' => $match['tagname'], 'content' => $match['content'], 'deepness' => $recursion);
            if ($InnerResults = search_tags($match['content'], $recursion+1)) {
                $Results = array_merge($Results, $InnerResults);
            }
        }
        return $Results;
    }
    return false;
}

这将返回一个数组,其中所有匹配项包含整个匹配项,标记名称,括号内容和迭代计数器,显示匹配嵌套在其他标记内的频率。我在你的字符串中添加了另一层嵌套用于演示:

$text = "This is some random text, tag1{while this is inside a tag2{tag}}. This is some other text tag3{also with a tag tag4{and another nested tag5{inside}} of it}.";
echo '<pre>'.print_r(search_tags($text), true).'</pre>';

输出将是:

Array
(
    [0] => Array
        (
            [match] => tag1{while this is inside a tag2{tag}}
            [tagname] => tag1
            [content] => while this is inside a tag2{tag}
            [deepness] => 0
        )

    [1] => Array
        (
            [match] => tag2{tag}
            [tagname] => tag2
            [content] => tag
            [deepness] => 1
        )

    [2] => Array
        (
            [match] => tag3{also with a tag tag4{and another nested tag5{inside}} of it}
            [tagname] => tag3
            [content] => also with a tag tag4{and another nested tag5{inside}} of it
            [deepness] => 0
        )

    [3] => Array
        (
            [match] => tag4{and another nested tag5{inside}}
            [tagname] => tag4
            [content] => and another nested tag5{inside}
            [deepness] => 1
        )

    [4] => Array
        (
            [match] => tag5{inside}
            [tagname] => tag5
            [content] => inside
            [deepness] => 2
        )

)

答案 2 :(得分:1)

正则表达式就是这样:

tag[0-9]+\{[^\}]+

你应该先替换内部标签

答案 3 :(得分:0)

我认为没有别的办法。你需要遍历每个括号。

     $output=array();
     $pos=0;     
while(preg_match('/tag\d+\{/S',$input,$match,PREG_OFFSET_CAPTURE,$pos)){
   $start=$match[0][1];
   $pos=$offset=$start+strlen($match[0][0]);
   $bracket=1;
   while($bracket!==0 and preg_match('/\{|\}/S',$input,$found,PREG_OFFSET_CAPTURE,$offset)){
      ($found[0][0]==='}')?$bracket--:$bracket++;
      $offset=$found[0][1]+1;
   }
   $output[]=substr($input,$start,$offset-$start);
}