在执行嵌套分支重置时需要确认捕获组编号

时间:2012-06-12 16:48:17

标签: regex

在进行嵌套分支重置时,我已经尽可能地猜测它是如何工作的。我在互联网上搜索了嵌套信息,但找不到明确的确认信息。

大多数情况下,关注的是在内部嵌套时立即发生的顺序 以下样本是我最好的猜测,如果有人能够确认其正确或引导我朝着正确的方向进行,那么它将会被认可。

示例正则表达式:

(a)(?|x(y)z(?|(u)(u)(u)(u)(u)(u)|(e)(e)(e)|(c))(K)|(p(q(?|(M)(M)(M)(M)(?|(T)(T)(T)|(D)(D))(R)(R)|(B)(B)(B)|(v)))r)(o)(i)|(t)s(w))(Z)

Number Sequenced regex:

1    ( a )
     (?|
          x
2         ( y )
          z
          (?|
3              ( u )
4              ( u )
5              ( u )
6              ( u )
7              ( u )
8              ( u )
            |  
3              ( e )
4              ( e )
5              ( e )
            |  
3              ( c )
          )
9         ( K )
       |  
2         (
               p
  3            (
                    q
                    (?|
    4                    ( M )
    5                    ( M )
    6                    ( M )
    7                    ( M )
                         (?|
    8                         ( T )
    9                         ( T )
    10                        ( T )
                           |  
    8                         ( D )
    9                         ( D )
                         )
    11                   ( R )
    12                   ( R )
                      |  
    4                    ( B )
    5                    ( B )
    6                    ( B )
                      |  
    4                    ( v )
                    )
  3            )
               r
2         )
13        ( o )
14        ( i )
       |  
2         ( t )
          s
3         ( w )
     )
15   ( Z )

Perl测试用例:

Formatted:

 # (a)(?|x(y)z(?|(u)(u)(u)(u)(u)(u)|(e)(e)(e)|(c))(K)|(p(q(?|(M)(M)(M)(M)(?|(T)(T)(T)|(D)(D))(R)(R)|(B)(B)(B)|(v)))r)(o)(i)|(t)s(w))(Z)

 ( a )                         # (1)
 (?|
      x
      ( y )                         # (2)
      z
      (?|
           ( u )                         # (3)
           ( u )                         # (4)
           ( u )                         # (5)
           ( u )                         # (6)
           ( u )                         # (7)
           ( u )                         # (8)
        |  
           ( e )                         # (3)
           ( e )                         # (4)
           ( e )                         # (5)
        |  
           ( c )                         # (3)
      )
      ( K )                         # (9)
   |  
      (                             # (2 start)
           p
           (                             # (3 start)
                q
                (?|
                     ( M )                         # (4)
                     ( M )                         # (5)
                     ( M )                         # (6)
                     ( M )                         # (7)
                     (?|
                          ( T )                         # (8)
                          ( T )                         # (9)
                          ( T )                         # (10)
                       |  
                          ( D )                         # (8)
                          ( D )                         # (9)
                     )
                     ( R )                         # (11)
                     ( R )                         # (12)
                  |  
                     ( B )                         # (4)
                     ( B )                         # (5)
                     ( B )                         # (6)
                  |  
                     ( v )                         # (4)
                )
           )                             # (3 end)
           r
      )                             # (2 end)
      ( o )                         # (13)
      ( i )                         # (14)
   |  
      ( t )                         # (2)
      s
      ( w )                         # (3)
 )
 ( Z )                         # (15)

Perl引擎结果:
输入

axyzuuuuuuKZ
axyzeeeKZ
axyzcKZ
apqMMMMTTTRRroiZ
apqMMMMDDRRroiZ
apqBBBroiZ
apqvroiZ
atswZ

输出

 **  Grp 0 -  ( pos 0 , len 12 ) 
axyzuuuuuuKZ  
 **  Grp 1 -  ( pos 0 , len 1 ) 
a  
 **  Grp 2 -  ( pos 2 , len 1 ) 
y  
 **  Grp 3 -  ( pos 4 , len 1 ) 
u  
 **  Grp 4 -  ( pos 5 , len 1 ) 
u  
 **  Grp 5 -  ( pos 6 , len 1 ) 
u  
 **  Grp 6 -  ( pos 7 , len 1 ) 
u  
 **  Grp 7 -  ( pos 8 , len 1 ) 
u  
 **  Grp 8 -  ( pos 9 , len 1 ) 
u  
 **  Grp 9 -  ( pos 10 , len 1 ) 
K  
 **  Grp 10 -  NULL 
 **  Grp 11 -  NULL 
 **  Grp 12 -  NULL 
 **  Grp 13 -  NULL 
 **  Grp 14 -  NULL 
 **  Grp 15 -  ( pos 11 , len 1 ) 
Z  

-----------------------

 **  Grp 0 -  ( pos 14 , len 9 ) 
axyzeeeKZ  
 **  Grp 1 -  ( pos 14 , len 1 ) 
a  
 **  Grp 2 -  ( pos 16 , len 1 ) 
y  
 **  Grp 3 -  ( pos 18 , len 1 ) 
e  
 **  Grp 4 -  ( pos 19 , len 1 ) 
e  
 **  Grp 5 -  ( pos 20 , len 1 ) 
e  
 **  Grp 6 -  NULL 
 **  Grp 7 -  NULL 
 **  Grp 8 -  NULL 
 **  Grp 9 -  ( pos 21 , len 1 ) 
K  
 **  Grp 10 -  NULL 
 **  Grp 11 -  NULL 
 **  Grp 12 -  NULL 
 **  Grp 13 -  NULL 
 **  Grp 14 -  NULL 
 **  Grp 15 -  ( pos 22 , len 1 ) 
Z  

-----------------------

 **  Grp 0 -  ( pos 25 , len 7 ) 
axyzcKZ  
 **  Grp 1 -  ( pos 25 , len 1 ) 
a  
 **  Grp 2 -  ( pos 27 , len 1 ) 
y  
 **  Grp 3 -  ( pos 29 , len 1 ) 
c  
 **  Grp 4 -  NULL 
 **  Grp 5 -  NULL 
 **  Grp 6 -  NULL 
 **  Grp 7 -  NULL 
 **  Grp 8 -  NULL 
 **  Grp 9 -  ( pos 30 , len 1 ) 
K  
 **  Grp 10 -  NULL 
 **  Grp 11 -  NULL 
 **  Grp 12 -  NULL 
 **  Grp 13 -  NULL 
 **  Grp 14 -  NULL 
 **  Grp 15 -  ( pos 31 , len 1 ) 
Z  

-----------------------

 **  Grp 0 -  ( pos 34 , len 16 ) 
apqMMMMTTTRRroiZ  
 **  Grp 1 -  ( pos 34 , len 1 ) 
a  
 **  Grp 2 -  ( pos 35 , len 12 ) 
pqMMMMTTTRRr  
 **  Grp 3 -  ( pos 36 , len 10 ) 
qMMMMTTTRR  
 **  Grp 4 -  ( pos 37 , len 1 ) 
M  
 **  Grp 5 -  ( pos 38 , len 1 ) 
M  
 **  Grp 6 -  ( pos 39 , len 1 ) 
M  
 **  Grp 7 -  ( pos 40 , len 1 ) 
M  
 **  Grp 8 -  ( pos 41 , len 1 ) 
T  
 **  Grp 9 -  ( pos 42 , len 1 ) 
T  
 **  Grp 10 -  ( pos 43 , len 1 ) 
T  
 **  Grp 11 -  ( pos 44 , len 1 ) 
R  
 **  Grp 12 -  ( pos 45 , len 1 ) 
R  
 **  Grp 13 -  ( pos 47 , len 1 ) 
o  
 **  Grp 14 -  ( pos 48 , len 1 ) 
i  
 **  Grp 15 -  ( pos 49 , len 1 ) 
Z  

-----------------------

 **  Grp 0 -  ( pos 52 , len 15 ) 
apqMMMMDDRRroiZ  
 **  Grp 1 -  ( pos 52 , len 1 ) 
a  
 **  Grp 2 -  ( pos 53 , len 11 ) 
pqMMMMDDRRr  
 **  Grp 3 -  ( pos 54 , len 9 ) 
qMMMMDDRR  
 **  Grp 4 -  ( pos 55 , len 1 ) 
M  
 **  Grp 5 -  ( pos 56 , len 1 ) 
M  
 **  Grp 6 -  ( pos 57 , len 1 ) 
M  
 **  Grp 7 -  ( pos 58 , len 1 ) 
M  
 **  Grp 8 -  ( pos 59 , len 1 ) 
D  
 **  Grp 9 -  ( pos 60 , len 1 ) 
D  
 **  Grp 10 -  NULL 
 **  Grp 11 -  ( pos 61 , len 1 ) 
R  
 **  Grp 12 -  ( pos 62 , len 1 ) 
R  
 **  Grp 13 -  ( pos 64 , len 1 ) 
o  
 **  Grp 14 -  ( pos 65 , len 1 ) 
i  
 **  Grp 15 -  ( pos 66 , len 1 ) 
Z  

-----------------------

 **  Grp 0 -  ( pos 69 , len 10 ) 
apqBBBroiZ  
 **  Grp 1 -  ( pos 69 , len 1 ) 
a  
 **  Grp 2 -  ( pos 70 , len 6 ) 
pqBBBr  
 **  Grp 3 -  ( pos 71 , len 4 ) 
qBBB  
 **  Grp 4 -  ( pos 72 , len 1 ) 
B  
 **  Grp 5 -  ( pos 73 , len 1 ) 
B  
 **  Grp 6 -  ( pos 74 , len 1 ) 
B  
 **  Grp 7 -  NULL 
 **  Grp 8 -  NULL 
 **  Grp 9 -  NULL 
 **  Grp 10 -  NULL 
 **  Grp 11 -  NULL 
 **  Grp 12 -  NULL 
 **  Grp 13 -  ( pos 76 , len 1 ) 
o  
 **  Grp 14 -  ( pos 77 , len 1 ) 
i  
 **  Grp 15 -  ( pos 78 , len 1 ) 
Z  

-----------------------

 **  Grp 0 -  ( pos 81 , len 8 ) 
apqvroiZ  
 **  Grp 1 -  ( pos 81 , len 1 ) 
a  
 **  Grp 2 -  ( pos 82 , len 4 ) 
pqvr  
 **  Grp 3 -  ( pos 83 , len 2 ) 
qv  
 **  Grp 4 -  ( pos 84 , len 1 ) 
v  
 **  Grp 5 -  NULL 
 **  Grp 6 -  NULL 
 **  Grp 7 -  NULL 
 **  Grp 8 -  NULL 
 **  Grp 9 -  NULL 
 **  Grp 10 -  NULL 
 **  Grp 11 -  NULL 
 **  Grp 12 -  NULL 
 **  Grp 13 -  ( pos 86 , len 1 ) 
o  
 **  Grp 14 -  ( pos 87 , len 1 ) 
i  
 **  Grp 15 -  ( pos 88 , len 1 ) 
Z  

-----------------------

 **  Grp 0 -  ( pos 91 , len 5 ) 
atswZ  
 **  Grp 1 -  ( pos 91 , len 1 ) 
a  
 **  Grp 2 -  ( pos 92 , len 1 ) 
t  
 **  Grp 3 -  ( pos 94 , len 1 ) 
w  
 **  Grp 4 -  NULL 
 **  Grp 5 -  NULL 
 **  Grp 6 -  NULL 
 **  Grp 7 -  NULL 
 **  Grp 8 -  NULL 
 **  Grp 9 -  NULL 
 **  Grp 10 -  NULL 
 **  Grp 11 -  NULL 
 **  Grp 12 -  NULL 
 **  Grp 13 -  NULL 
 **  Grp 14 -  NULL 
 **  Grp 15 -  ( pos 95 , len 1 ) 
Z  

1 个答案:

答案 0 :(得分:2)

似乎正确。由于分支重置中的捕获组数量等于其任何分支中捕获组的最大数量。

以下是perlre的引用:

  

每个分支内的编号将正常,并且此构造后面的任何组都将被编号,就好像构造只包含一个分支,即其中捕获组最多的分支。