如果内容因块与块的不同而如何解析多行块文本使用Python&正则表达式

时间:2016-10-31 20:08:52

标签: python regex multiline

我有一个需要解析的配置文件,由于python中的groupins,我的想法是将它放在字典中。

我面临的问题是并非每个文本块中的所有行都完全相同,到目前为止,我的正则表达式对于具有最多行的块来说是有效的,但当然只匹配该单个块。 如果某些"设置"如何进行多行匹配?在某些块中省略了实例。

  • 我是否需要打破正则表达式并使用if,elsif,true / false语句来解决这个问题?似乎不是pythonic imho。

  • 我很确定我不得不解散我的大正则表达式并按顺序完成它吗? if true,然后...... else跳到下一个正则表达式匹配行。

  • 是否正在考虑将编辑中的每个块放到下一个要单独解析的列表元素中?或者我可以一次完成整个事情吗?

我有一些想法,但我希望som pythonic做到这一点。

与往常一样,非常感谢您的帮助。 谢谢

TEXT,匹配的块是从编辑到下一个。并非每个块都包含相同的" set"陈述:

edit "port11"
    set vdom "ACME_Prod"
    set vlanforward enable
    set type physical
    set device-identification enable
    set snmp-index 26
next
edit "port21"
    set vdom "ACME_Prod"
    set vlanforward enable
    set type physical
    set snmp-index 27
next
edit "port28"
    set vdom "ACME_Prod"
    set vlanforward enable
    set type physical
    set snmp-index 28
next
edit "port29"
    set vdom "ACME_Prod"
    set ip 174.244.244.244 255.255.255.224
    set allowaccess ping
    set vlanforward enable
    set type physical
    set alias "Internet-IRISnet"
    set snmp-index 29
next
edit "port20"
    set vdom "root"
    set ip 192.168.1.1 255.255.255.0
    set allowaccess ping https ssh snmp fgfm
    set vlanforward enable
    set type physical
    set snmp-index 39
next
edit "port25"
    set vdom "root"
    set allowaccess fgfm
    set vlanforward enable
    set type physical
    set snmp-index 40
next

CODE SNIPPET:

import re, pprint
file = "interfaces_2016_10_12.conf"

try:
    """
    fileopen = open(file, 'r')
    output = open('output.txt', 'w+')
except:
    exit("Input file does not exist, exiting script.")

#read whole config in 1 go instead of iterating line by line
text = fileopen.read()   

# my verbose regex, verbose so it is more readable !

pattern = r'''^                 # use r for multiline usage
\s+edit\s\"(.*)\"\n           # group(1) match int name
\s+set\svdom\s\"(.*)\"\n      # group(2) match vdom name
\s+set\sip\s(.*)\n            # group(3) match interface ip
\s+set\sallowaccess\s(.*)\n   # group(4) match allowaccess
\s+set\svlanforward\s(.*)\n   # group(5) match vlanforward
\s+set\stype\s(.*)\n          # group(6) match type
\s+set\salias\s\"(.*)\"\n     # group(7) match alias
\s+set\ssnmp-index\s\d{1,3}\n # match snmp-index but we don't need it
\s+next$'''                   # match end of config block

regexp = re.compile(pattern, re.VERBOSE | re.MULTILINE)

For multiline regex matching use finditer(): 
"""
z = 1
for match in regexp.finditer(text):
    while z < 8:
        print match.group(z)
        z += 1

fileopen.close()  #always close file
output.close() #always close file

2 个答案:

答案 0 :(得分:1)

为什么在看起来非常简单的解析结构时使用regex

data = {}
with open(file, 'r') as fileopen:
    for line in fileopen:
        words = line.strip().split()
        if words[0] == 'edit':  # Create a new block
            curr = data.setdefault(words[1].strip('"'), {})
        elif words[0] == 'set': # Write config to block
            curr[words[1]] = words[2].strip('"') if len(words) == 3 else words[2:]
print(data)

输出:

{'port11': {'device-identification': 'enable',
  'snmp-index': '26',
  'type': 'physical',
  'vdom': 'ACME_Prod',
  'vlanforward': 'enable'},
 'port20': {'allowaccess': ['ping', 'https', 'ssh', 'snmp', 'fgfm'],
  'ip': ['192.168.1.1', '255.255.255.0'],
  'snmp-index': '39',
  'type': 'physical',
  'vdom': 'root',
  'vlanforward': 'enable'},
  ...

答案 1 :(得分:0)

怎么样:

config = {}
for block in re.split('\nnext\n',open('datafile'):
     for cmd in block.split("\n"):
         cmd = cmd.strip().split()
         if cmd[0] == 'edit': 
             current = cmd[1]
             config[current] = {}
             continue
         config[current][cmd[1]] = cmd[2]

我认为这是可读的,但另一个答案更可取,因为我认为(没有正则表达式)。赞成它。