如何匹配正则表达式中的主要子组?

时间:2014-09-30 16:32:43

标签: regex preg-match

有这个字符串:

"example( other(1), 123, [25]).othermethod(456)"

我如何只捕获主要函数的参数:

"other(1), 123, [25]" and "456"

我正在尝试这个: http://regex101.com/r/cR0uS9/2

在html示例中。有这个:

<div>
    <div>
        <div>12</div>
        <div>34</div>
    </div>
</div>
<div>56</div>

我想得到:

<div>
    <div>12</div>
    <div>34</div>
</div>

56作为第二场比赛。

2 个答案:

答案 0 :(得分:1)

这是一个不使用递归的模式:

\w+\s*\((?P<parameters>(?:(?:(?:[^()]*\([^()]*\))+|[^()]*)(?:,(?!\s*\))|(?=\))))*)\)

注意事项:

  1. 不支持超过2级嵌套大括号。例如 a(b(c()))
  2. 包含()的字符串会将其绊倒。例如 a(")")
  3. 您将在组中找到名为“参数”的参数。

    Demo.

    说明:

    \w+ # function name
    \s* # white space
    \(
    (?P<parameters> # parameters:
        (?:
            # two possibilities:    1: a simple parameter, like "12", "'hello'", or "3*1+2"
            #                       2: the parameter contains braces.
            # we'll try to consume pairs of braces. If that fails, we'll simply match a parameter.
            (?:
                (?: # match a pair of braces ()
                    [^()]*
                    \(
                    [^()]*
                    \)
                )+ # consume as many pairs of braces as possible. Make sure there's at least one, though, because we can't go matching nothing.
            |
                [^()]* # since there are no more (pairs of) braces, simply consume the function's parameters.
            )
    
            # next, either consume a "," or assert there's a ")"
            (?:
                ,
                (?! # make sure there is another parameter after the comma
                    \s*
                    \)
                )
            |
                (?=
                    \)
                )
            )
        )*
    )
    \)
    

    P.S。:我还没有设法为HTML示例提供可接受的模式。

答案 1 :(得分:0)

这会做一些递归。在全局查找功能中使用它。

 # '~(?is)(?:([a-z]\w*)\s*\(((?&core)|)\))(?(DEFINE)(?<core>(?>(?&content)|(?:[a-z]\w*\s*\(|\()(?:(?=.)(?&core)|)\))+)(?<content>(?>(?![a-z]\w*\s*\(|[()]).)+))~'

 (?xis-)
 (?:
      ( [a-z] \w* )         # (1), Start-Delimiter, Function
      \s* \(                   
      (                     # (2), CORE
           (?&core) 
        |  
      )
      \)                    # End-Delimiter, close paren
 )

 # ///////////////////////
 # // Subroutines
 # // ---------------

 (?(DEFINE)

      # core
      (?<core>
           (?>
                (?&content) 
             |  
                (?:                   # Start-Delimiter
                     [a-z] \w* \s* \(      # Function
                  |  \(                    # Or, a open paren
                )
                (?:
                     (?= . )
                     (?&core)              # Recurse core
                  |  
                )
                \)                    # End-Delimiter, close paren
           )+
      )

      # content 
      (?<content>
           (?>
                (?!
                     [a-z] \w* \s* \(
                  |  [()] 
                )
                . 
           )+
      )
 )

输出:

 **  Grp 0 -  ( pos 0 , len 29 ) 
example( other(1), 123, [25])  
 **  Grp 1 -  ( pos 0 , len 7 ) 
example  
 **  Grp 2 -  ( pos 8 , len 20 ) 
 other(1), 123, [25]  
 **  Grp 3 -  NULL 
 **  Grp 4 -  NULL 

-----------------------

 **  Grp 0 -  ( pos 30 , len 16 ) 
othermethod(456)  
 **  Grp 1 -  ( pos 30 , len 11 ) 
othermethod  
 **  Grp 2 -  ( pos 42 , len 3 ) 
456  
 **  Grp 3 -  NULL 
 **  Grp 4 -  NULL 

对于html div -

 # '~(?s)(?:<div>((?&core)|)</div>)(?(DEFINE)(?<core>(?>(?&content)|<div>(?:(?=.)(?&core)|)</div>)+)(?<content>(?>(?!</?div>).)+))~'

 (?xs-)
 (?:
      <div>                 # Start-Delimiter <div>
      (                     # (1), CORE
           (?&core) 
        |  
      )
      </div>                # End-Delimiter </div>
 )

 # ///////////////////////
 # // Subroutines
 # // ---------------

 (?(DEFINE)

      # core
      (?<core>
           (?>
                (?&content) 
             |  
                <div>                 # Start-Delimiter <div>
                (?:
                     (?= . )
                     (?&core)              # Recurse core
                  |  
                )
                </div>                # End-Delimiter  </div>
           )+
      )

      # content 
      (?<content>
           (?>
                (?! </?div> )
                . 
           )+
      )
 )

输出:

 **  Grp 0 -  ( pos 0 , len 82 ) 
<div>
    <div>
        <div>12</div>
        <div>34</div>
    </div>
</div>  
 **  Grp 1 -  ( pos 5 , len 71 ) 

    <div>
        <div>12</div>
        <div>34</div>
    </div>

 **  Grp 2 -  NULL 
 **  Grp 3 -  NULL 

---------------------------

 **  Grp 0 -  ( pos 84 , len 13 ) 
<div>56</div>  
 **  Grp 1 -  ( pos 89 , len 2 ) 
56  
 **  Grp 2 -  NULL 
 **  Grp 3 -  NULL