RegExp替换嵌套结构中的匹配括号

时间:2016-05-13 16:26:58

标签: javascript php regex parsing lexical-analysis

如果第一个左括号跟在关键字array之后,如何替换一组匹配的左/右括号?正则表达式可以帮助解决这类问题吗?

为了更具体,我想使用JavaScript或PHP解决此问题

// input
$data = array(
    'id' => nextId(),
    'profile' => array(
       'name' => 'Hugo Hurley',
       'numbers' => (4 + 8 + 15 + 16 + 23 + 42) / 108
    )
);

// desired output
$data = [
    'id' => nextId(),
    'profile' => [
       'name' => 'Hugo Hurley',
       'numbers' => (4 + 8 + 15 + 16 + 23 + 42) / 108
    ]
];

2 个答案:

答案 0 :(得分:3)

Tim Pietzcker给出了Dot-Net计数版本 它与下面的PCRE(php)版本具有相同的元素。

所有警告都是一样的。特别是,非数组括号必须为 平衡,因为他们使用相同的右括号作为分隔符。

必须解析(或应该)所有文本 外部组1,2,3,4允许您获取部件
内容
CORE-1 array()
CORE-2任何()
例外

每场比赛都会让你获得这些外在的东西,并且互相排斥。

诀窍是定义一个解析CORE的php 函数 parse( core)
在该函数内部是while (regex.search( core ) { .. }循环。

每当 CORE-1或2 组匹配时,调用parse( core )函数通过
该核心组的内容。

在循环内部,只需取消内容并将其分配给哈希。

显然,应该替换调用(?&content)的组1构造 使用构造来获取类似可变数据的哈希值。

在详细的范围内,这可能非常繁琐 通常,您必须正确地说明每个字符 解析整个事情。

(?is)(?:((?&content))|(?>\barray\s*\()((?=.)(?&core)|)\)|\(((?=.)(?&core)|)\)|(\barray\s*\(|[()]))(?(DEFINE)(?<core>(?>(?&content)|(?>\barray\s*\()(?:(?=.)(?&core)|)\)|\((?:(?=.)(?&core)|)\))+)(?<content>(?>(?!\barray\s*\(|[()]).)+))

扩展

 # 1:  CONTENT
 # 2:  CORE-1
 # 3:  CORE-2
 # 4:  EXCEPTIONS

 (?is)

 (?:
      (                                  # (1), Take off   CONTENT
           (?&content) 
      )
   |                                   # OR -----------------------------
      (?>                                # Start 'array('
           \b array \s* \(
      )
      (                                  # (2), Take off   'array( CORE-1 )'
           (?= . )
           (?&core) 
        |  
      )
      \)                                 # End ')'
   |                                   # OR -----------------------------
      \(                                 # Start '('
      (                                  # (3), Take off   '( any CORE-2 )'
           (?= . )
           (?&core) 
        |  
      )
      \)                                 # End ')'
   |                                   # OR -----------------------------
      (                                  # (4), Take off   Unbalanced or Exceptions
           \b array \s* \(
        |  [()] 
      )
 )

 # Subroutines
 # ---------------

 (?(DEFINE)

      # core
      (?<core>
           (?>
                (?&content) 
             |  
                (?> \b array \s* \( )
                # recurse core of  array()
                (?:
                     (?= . )
                     (?&core) 
                  |  
                )
                \)
             |  
                \(
                # recurse core of any  ()
                (?:
                     (?= . )
                     (?&core) 
                  |  
                )
                \)
           )+
      )

      # content 
      (?<content>
           (?>
                (?!
                     \b array \s* \(
                  |  [()] 
                )
                . 
           )+
      )
 )

输出

 **  Grp 0           -  ( pos 0 , len 11 ) 
some_var =   
 **  Grp 1           -  ( pos 0 , len 11 ) 
some_var =   
 **  Grp 2           -  NULL 
 **  Grp 3           -  NULL 
 **  Grp 4 [core]    -  NULL 
 **  Grp 5 [content] -  NULL 

-----------------------

 **  Grp 0           -  ( pos 11 , len 153 ) 
array(
    'id' => nextId(),
    'profile' => array(
       'name' => 'Hugo Hurley',
       'numbers' => (4 + 8 + 15 + 16 + 23 + 42) / 108
    ) 
)  
 **  Grp 1           -  NULL 
 **  Grp 2           -  ( pos 17 , len 146 ) 

    'id' => nextId(),
    'profile' => array(
       'name' => 'Hugo Hurley',
       'numbers' => (4 + 8 + 15 + 16 + 23 + 42) / 108
    ) 

 **  Grp 3           -  NULL 
 **  Grp 4 [core]    -  NULL 
 **  Grp 5 [content] -  NULL 

-------------------------------------

 **  Grp 0           -  ( pos 164 , len 3 ) 
;

 **  Grp 1           -  ( pos 164 , len 3 ) 
;

 **  Grp 2           -  NULL 
 **  Grp 3           -  NULL 
 **  Grp 4 [core]    -  NULL 
 **  Grp 5 [content] -  NULL 

之前的其他内容,以了解用法

 # Perl code:
 # 
 #     use strict;
 #     use warnings;
 #     
 #     use Data::Dumper;
 #     
 #     $/ = undef;
 #     my $content = <DATA>;
 #     
 #     # Set the error mode on/off here ..
 #     my $BailOnError = 1;
 #     my $IsError = 0;
 #     
 #     my $href = {};
 #     
 #     ParseCore( $href, $content );
 #     
 #     #print Dumper($href);
 #     
 #     print "\n\n";
 #     print "\nBase======================\n";
 #     print $href->{content};
 #     print "\nFirst======================\n";
 #     print $href->{first}->{content};
 #     print "\nSecond======================\n";
 #     print $href->{first}->{second}->{content};
 #     print "\nThird======================\n";
 #     print $href->{first}->{second}->{third}->{content};
 #     print "\nFourth======================\n";
 #     print $href->{first}->{second}->{third}->{fourth}->{content};
 #     print "\nFifth======================\n";
 #     print $href->{first}->{second}->{third}->{fourth}->{fifth}->{content};
 #     print "\nSix======================\n";
 #     print $href->{six}->{content};
 #     print "\nSeven======================\n";
 #     print $href->{six}->{seven}->{content};
 #     print "\nEight======================\n";
 #     print $href->{six}->{seven}->{eight}->{content};
 #     
 #     exit;
 #     
 #     
 #     sub ParseCore
 #     {
 #         my ($aref, $core) = @_;
 #         my ($k, $v);
 #         while ( $core =~ /(?is)(?:((?&content))|(?><!--block:(.*?)-->)((?&core)|)<!--endblock-->|(<!--(?:block:.*?|endblock)-->))(?(DEFINE)(?<core>(?>(?&content)|(?><!--block:.*?-->)(?:(?&core)|)<!--endblock-->)+)(?<content>(?>(?!<!--(?:block:.*?|endblock)-->).)+))/g )
 #         {
 #            if (defined $1)
 #            {
 #              # CONTENT
 #                $aref->{content} .= $1;
 #            }
 #            elsif (defined $2)
 #            {
 #              # CORE
 #                $k = $2; $v = $3;
 #                $aref->{$k} = {};
 #      #         $aref->{$k}->{content} = $v;
 #      #         $aref->{$k}->{match} = $&;
 #                
 #                my $curraref = $aref->{$k};
 #                my $ret = ParseCore($aref->{$k}, $v);
 #                if ( $BailOnError && $IsError ) {
 #                    last;
 #                }
 #                if (defined $ret) {
 #                    $curraref->{'#next'} = $ret;
 #                }
 #            }
 #            else
 #            {
 #              # ERRORS
 #                print "Unbalanced '$4' at position = ", $-[0];
 #                $IsError = 1;
 #     
 #                # Decide to continue here ..
 #                # If BailOnError is set, just unwind recursion. 
 #                # -------------------------------------------------
 #                if ( $BailOnError ) {
 #                   last;
 #                }
 #            }
 #         }
 #         return $k;
 #     }
 #     
 #     #================================================
 #     __DATA__
 #     some html content here top base
 #     <!--block:first-->
 #         <table border="1" style="color:red;">
 #         <tr class="lines">
 #             <td align="left" valign="<--valign-->">
 #         <b>bold</b><a href="http://www.mewsoft.com">mewsoft</a>
 #         <!--hello--> <--again--><!--world-->
 #         some html content here 1 top
 #         <!--block:second-->
 #             some html content here 2 top
 #             <!--block:third-->
 #                 some html content here 3 top
 #                 <!--block:fourth-->
 #                     some html content here 4 top
 #                     <!--block:fifth-->
 #                         some html content here 5a
 #                         some html content here 5b
 #                     <!--endblock-->
 #                 <!--endblock-->
 #                 some html content here 3a
 #                 some html content here 3b
 #             <!--endblock-->
 #             some html content here 2 bottom
 #         <!--endblock-->
 #         some html content here 1 bottom
 #     <!--endblock-->
 #     some html content here1-5 bottom base
 #     
 #     some html content here 6-8 top base
 #     <!--block:six-->
 #         some html content here 6 top
 #         <!--block:seven-->
 #             some html content here 7 top
 #             <!--block:eight-->
 #                 some html content here 8a
 #                 some html content here 8b
 #             <!--endblock-->
 #             some html content here 7 bottom
 #         <!--endblock-->
 #         some html content here 6 bottom
 #     <!--endblock-->
 #     some html content here 6-8 bottom base
 # 
 # Output >>
 # 
 #     Base======================
 #     some html content here top base
 #     
 #     some html content here1-5 bottom base
 #     
 #     some html content here 6-8 top base
 #     
 #     some html content here 6-8 bottom base
 #     
 #     First======================
 #     
 #         <table border="1" style="color:red;">
 #         <tr class="lines">
 #             <td align="left" valign="<--valign-->">
 #         <b>bold</b><a href="http://www.mewsoft.com">mewsoft</a>
 #         <!--hello--> <--again--><!--world-->
 #         some html content here 1 top
 #         
 #         some html content here 1 bottom
 #     
 #     Second======================
 #     
 #             some html content here 2 top
 #             
 #             some html content here 2 bottom
 #         
 #     Third======================
 #     
 #                 some html content here 3 top
 #                 
 #                 some html content here 3a
 #                 some html content here 3b
 #             
 #     Fourth======================
 #     
 #                     some html content here 4 top
 #                     
 #                 
 #     Fifth======================
 #     
 #                         some html content here 5a
 #                         some html content here 5b
 #                     
 #     Six======================
 #     
 #         some html content here 6 top
 #         
 #         some html content here 6 bottom
 #     
 #     Seven======================
 #     
 #             some html content here 7 top
 #             
 #             some html content here 7 bottom
 #         
 #     Eight======================
 #     
 #                 some html content here 8a
 #                 some html content here 8b
 #         

答案 1 :(得分:2)

以下内容(使用.NET正则表达式引擎):

resultString = Regex.Replace(subjectString, 
    @"\barray\(            # Match 'array('
    (                      # Capture in group 1:
     (?>                   # Start a possessive group:
      (?:                  # Either match
       (?!\barray\(|[()])  # only if we're not before another array or parens
       .                   # any character
      )+                   # once or more
     |                     # or
      \( (?<Depth>)        # match '(' (and increase the nesting counter)
     |                     # or
      \) (?<-Depth>)       # match ')' (and decrease the nesting counter).
     )*                    # Repeat as needed.
     (?(Depth)(?!))        # Assert that the nesting counter is at zero.
    )                      # End of capturing group.
    \)                     # Then match ')'.", 
    "[$1]", RegexOptions.IgnorePatternWhitespace | RegexOptions.Singleline);

此正则表达式与array(...)匹配,其中...可能包含除另一个array(...)之外的任何内容(因此,它只匹配嵌套最深的出现次数)。它确实允许...中的其他嵌套(并且正确平衡)括号,但它不会检查它们是否是语义括号,或者它们是否包含在字符串或注释中。

换句话说,就像

array(
   'name' => 'Hugo ((( Hurley',
   'numbers' => (4 + 8 + 15 + 16 + 23 + 42) / 108
)

无法匹配(正确)。

您需要迭代地应用该正则表达式,直到它不再修改其输入 - 在您的示例的情况下,两次迭代就足够了。