Perl RegExp仅匹配最外层表达式

时间:2015-08-09 21:38:24

标签: regex perl

这是我要解析的字符串的简化示例:

$my_string = "000 AAA 111 ZZZ AAA 222 AAA 333 ZZZ ZZZ 444"

我想检索AAA&之间的内容。 ZZZ,但我想忽略嵌套AAA / ZZZ中的内容。因此,在上面的示例中,我想要111222(在此示例中它们是数字,但它们可以是任何字母数字,但AAA或{{1}除外}),但我忽略了ZZZ,因为它在嵌套的333 / AAA中。并且可以有任意数量的嵌套ZZZ / AAA。例如:

ZZZ

在第二个例子中,我只想要$my_string2 = "AAA 1 AAA 2 AAA 3 AAA 4 ZZZ ZZZ ZZZ ZZZ"

1 个答案:

答案 0 :(得分:4)

以下是递归解析的示例。在这种情况下,您只对此感兴趣 在1级内容。

** 添加了解析所有核心或单级核心的示例段(加快速度)。

 # (?s)(?:((?&content))|AAA((?&core)|)ZZZ|((?:AAA|ZZZ)))(?(DEFINE)(?<core>(?>(?&content)|AAA(?:(?=.)(?&core)|)ZZZ)+)(?<content>(?>(?!(?:AAA|ZZZ)).)+))

 # //////////////////////////////////////////////////////
 # // The General Guide to 3-Part Recursive Parsing
 # // ----------------------------------------------
 # // Part 1. CONTENT
 # // Part 2. CORE
 # // Part 3. ERRORS

 (?s)

 (?:
      (                                  # (1), Take off CONTENT
           (?&content) 
      )
   |                                   # OR
      AAA                                # Start-Delimiter
      (                                  # (2), Take off The CORE
           (?&core) 
        |  
      )
      ZZZ                                # End-Delimiter

   |                                   # OR
      (                                  # (3), Take off Unbalanced (delimeter) ERRORS
           (?: AAA | ZZZ )
      )
 )

 # ///////////////////////
 # // Subroutines
 # // ---------------

 (?(DEFINE)

      # core
      (?<core>
           (?>
                (?&content) 
             |  
                AAA 
                # recurse core
                (?:
                     (?= . )
                     (?&core) 
                  |  
                )
                ZZZ
           )+
      )

      # content 
      (?<content>
           (?>
                (?!
                     (?: AAA | ZZZ )
                )
                . 
           )+
      )

 )

Perl代码:

use strict;
use warnings;

$/ = undef;
my $content = <DATA>;

# Set the error mode on/off here ..
my $BailOnError = 1;
my $IsError = 0;

my @vals = ();
my $level = 0;

ParseCore( $content );

print "\n@vals";
exit;

sub ParseCore
{
    my ($core) = @_;
    while ( $core =~ /(?s)(?:((?&content))|AAA((?&core)|)ZZZ|((?:AAA|ZZZ)))(?(DEFINE)(?<core>(?>(?&content)|AAA(?:(?=.)(?&core)|)ZZZ)+)(?<content>(?>(?!(?:AAA|ZZZ)).)+))/g )
    {
       if (defined $1)
       {
         # CONTENT
           if ( $level == 1 ) {
               push @vals, $1;
           }
       }
       elsif (defined $2)
       {
         # CORE
           my $k = $2;

           # To parse all core's:
           # -----------------------
           # ++$level;
           # ParseCore( $k );
           # --$level;

           # To parse just level 1 core's:
           # ----------------------------------
           if ( $level == 0 ) {
              ++$level;
              ParseCore( $k );
              --$level;
           }

           if ( $BailOnError && $IsError ) {
               last;
           }
       }
       else
       {
         # ERRORS
           print "Unbalanced '$3' at position = ", $-[0];
           $IsError = 1;

           # Decide to continue here ..
           # If BailOnError is set, just unwind recursion. 
           # -------------------------------------------------
           if ( $BailOnError ) {
              last;
           }
       }
    }
}

#================================================
__DATA__

000 AAA 111 ZZZ AAA 222 AAA 333 ZZZ ZZZ 444
AAA 1 AAA 2 AAA 3 AAA 4 ZZZ ZZZ ZZZ ZZZ

输出:

111   222     1