从文本文件报告 - python

时间:2017-09-23 14:15:45

标签: regex python-2.7 text

我需要一些文件来从中提取某些信息: 文件内容示例(不是我屏蔽了IP)。每个文件大约可以有15K行,示例内容如下:

(*, 224.0.0.50/32), uptime: 27w6d, igmp ip pim                    
  Incoming interface: Ethernet1/36, RPF nbr: 1.1.1.2, uptime: 1w4d
  Outgoing interface list: (count: 3)                             
    Ethernet1/47, uptime: 1w5d, pim                               
    Vlan25, uptime: 7w4d, igmp                                    
    Vlan20, uptime: 27w6d, igmp                                   

(1.1.1.1/32, 224.0.0.50/32), uptime: 09:51:59, ip mrib pim             
  Incoming interface: Ethernet1/36, RPF nbr: 1.1.1.2, uptime: 09:51:59
  Outgoing interface list: (count: 3)                                 
    Ethernet1/47, uptime: 09:51:59, pim                               
    Vlan20, uptime: 09:51:59, mrib                                    
    Vlan25, uptime: 09:51:59, mrib

我需要做的是运行文件并打印以下内容:

Source IP  Group IP     Incoming Interface     Outgoing Interface
1.1.1.1    224.0.0.50    Ethernet1/36           Vlan20, Vlan25

我写的是:

import re

mroute = open("multicast.txt", 'r')

for line in mroute:
    if re.match("(.*)(\()1(.*)", line):
        print line
for line in mroute:
    if re.match("(.*)(In)(.*)",line):
       print line
for line in mroute:
    if re.match("(^)(Out)(.*)",line):
       print line

但是当我加入它们时,每个部分都是独立的,它不会显示任何内容。

1 个答案:

答案 0 :(得分:0)

我相信你的问题是你的每一个for line in mroute耗尽了迭代器。更好的方法是收集线组并随时处理它们。

以下是我将从

开始的内容
import sys, re

class Details(dict):
    _format = '%(source)-20s %(group)-20s %(incoming)-20s %(outgoing)-20s'
    header  = _format % {
        'source':'Source IP',
        'group': 'Group IP',
        'incoming': 'Incoming Interface',
        'outgoing': 'Outgoing Interface'
    }
    def __repr__(self):
        return self._format % self

    def __init__(self, g):
        super(Details, self).__init__()
        self.g = g
        self.parse_line(0, "^[(](?P<source>[^,]+), (?P<group>[^)]+)")
        self.parse_line(1, "^  Incoming interface: (?P<incoming>[^,]+),")
        self.parse_line(2, "^  Outgoing interface list: (?P<unused>.+)")
        for l in self.g[3:]:
            p = "^    ([^,]+),"
            m = re.search(p, l)
            if m:
                o = self.get('outgoing','')
                if o: o += ', '
                self['outgoing'] = o + m.group(1)

    def parse_line(self, n, p, u='?'):
        r = re.compile(p)
        e = {x:u for x in r.groupindex}
        m = r.search(self.g[n])
        d = m.groupdict() if m else e
        self.update(d)

    @staticmethod
    def parse(lines):
        groups = [[]]
        for line in lines:
            if line.startswith("("):
                groups.append([])
            groups[-1].append(line)
        return [Details(g) for g in groups if g]

print Details.header
for d in Details.parse(file(sys.argv[1])):
    print d

样品运行

Source IP            Group IP             Incoming Interface   Outgoing Interface  
*                    224.0.0.50/32        Ethernet1/36         Ethernet1/47, Vlan25, Vlan20
1.1.1.1/32           224.0.0.50/32        Ethernet1/36         Ethernet1/47, Vlan20, Vlan25

这不处理不规则数据或接口的语义 但如果你知道自己想要的东西,那么很容易添加。