正则表达式用于在任意数量的忽略行之后匹配可选表达式

时间:2014-03-17 19:03:43

标签: python regex parsing

我需要用一些可选字段解析配置文件,我对它们都不感兴趣。我正在使用Python re.findall方法。

这是一个配置:

edit 750
    set srcintf "port1"
    set dstintf "port9"
        set srcaddr "addr1" "addr5"             
        set dstaddr "addr6"             
    set action accept
    set schedule "always"
        set service "ICMP_ANY"             
    set logtraffic enable
    set comments "This is the second one"
    set nat enable
    set ippool enable
        set poolname "name1"             
next

这是我到目前为止的正则表达式:

r'edit ([\d]+)\s+set srcintf "(.+?)"\s+set dstintf "(.+?)"\s+set srcaddr (.+?)\s+set dstaddr (.+?)\s+set action ([\w]+)\s+(?:set status ([\w]+)\s+)?set schedule "(.+?)"\s+set service (.+?)\s+(?:set .*?\s+)*?(?:set poolname "(.+?)"\s+)?(?:set .*\s+)*?next'

简单地说,我想在set service之后忽略任何内容,但会产生可选字段poolname

我的正则表达式的问题在于(?:set .*?\s+)*?消耗set poolname字段,尽管非贪婪标记。

如果poolname是必需的,那么正则表达式将完美运行,但情况并非如此。有什么想法吗?

1 个答案:

答案 0 :(得分:1)

它相当容易,只是引入一个否定的预测(?! .. )
建议使用RegexFormat来处理大型正则表达式

 #  edit[ ]([\d]+)\s+set[ ]srcintf[ ]"(.+?)"\s+set[ ]dstintf[ ]"(.+?)"\s+set[ ]srcaddr[ ](.+?)\s+set[ ]dstaddr[ ](.+?)\s+set[ ]action[ ]([\w]+)\s+(?:set[ ]status[ ]([\w]+)\s+)?set[ ]schedule[ ]"(.+?)"\s+set[ ]service[ ](.+?)\s+(?:set[ ](?!poolname[ ]".+?").*?\s+)*(?:set[ ]poolname[ ]"(.+?)"\s+)?(?:set[ ].*\s+)*next

 edit [ ] 
 ( [\d]+ )                          # (1)
 \s+ set [ ] srcintf [ ] "
 ( .+? )                            # (2)
 " \s+ set [ ] dstintf [ ] "
 ( .+? )                            # (3)
 " \s+ set [ ] srcaddr [ ] 
 ( .+? )                            # (4)
 \s+ set [ ] dstaddr [ ] 
 ( .+? )                            # (5)
 \s+ set [ ] action [ ] 
 ( [\w]+ )                          # (6)
 \s+ 
 (?:
      set [ ] status [ ] 
      ( [\w]+ )                     # (7)
      \s+ 
 )?
 set [ ] schedule [ ] "
 ( .+? )                            # (8)
 " \s+ set [ ] service [ ] 
 ( .+? )                            # (9)
 \s+ 
 (?:
      set [ ] 
      (?! poolname [ ] " .+? " )
      .*? 
      \s+ 
 )*
 (?:
      set [ ] poolname [ ] "
      ( .+? )                       # (10)
      " \s+ 
 )?
 (?: set [ ] .* \s+ )*
 next

Perl测试用例

$/ = undef;

$str = <DATA>;

while ( $str =~ /edit[ ]([\d]+)\s+set[ ]srcintf[ ]"(.+?)"\s+set[ ]dstintf[ ]"(.+?)"\s+set[ ]srcaddr[ ](.+?)\s+set[ ]dstaddr[ ](.+?)\s+set[ ]action[ ]([\w]+)\s+(?:set[ ]status[ ]([\w]+)\s+)?set[ ]schedule[ ]"(.+?)"\s+set[ ]service[ ](.+?)\s+(?:set[ ](?!poolname[ ]".+?").*?\s+)*(?:set[ ]poolname[ ]"(.+?)"\s+)?(?:set[ ].*\s+)*next/g )
{
    print "----------------------\n";
    print "1 = $1\n";
    print "2 = $2\n";
    print "3 = $3\n";
    print "4 = $4\n";
    print "5 = $5\n";
    print "6 = $6\n";
    print "7 = $7\n";
    print "8 = $8\n";
    print "9 = $9\n";
    print "Poolname = $10\n";
}


__DATA__

edit 750
    set srcintf "port1"
    set dstintf "port9"
        set srcaddr "addr1" "addr5"             
        set dstaddr "addr6"             
    set action accept
    set schedule "always"
        set service "ICMP_ANY"             
    set logtraffic enable
    set comments "This is the second one"
    set nat enable
    set ippool enable
        set poolname "name1"             
next

输出&gt;&gt;

----------------------
1 = 750
2 = port1
3 = port9
4 = "addr1" "addr5"
5 = "addr6"
6 = accept
7 =
8 = always
9 = "ICMP_ANY"
Poolname = name1