如何使用python正则表达式解析日志文件和包含IP和数值的键值字典

时间:2017-04-27 00:28:19

标签: python python-2.7 parsing dictionary logging

我想解析这个(Using VBA Code how to export excel worksheets as image in Excel 2003?)日志文件,其中包含特定格式的数字和输出的输出。 IP旁边。我想获取IP并对总连接数进行总计,例如在下面的语法中:

[0;32m192.168.1.34 | SUCCESS | rc=0 >>
2
192.168.1.97
5
192.168.1.152
3
192.168.2.108
11
192.168.2.144
[0m
[0;32m192.168.1.18 | SUCCESS | rc=0 >>
2
192.168.1.97
3
192.168.1.152
14
192.168.2.108
7
192.168.2.144
[0m
[0;32m192.168.2.137 | SUCCESS | rc=0 >>
5
192.168.1.97
10
192.168.1.152
53
192.168.2.108
6
192.168.2.144
[0m
[0;32m192.168.1.96 | SUCCESS | rc=0 >>

脚本的输入应该作为文件读取,存储数量和IP,然后输出应该是IP&总之,对。例如192.168.97:84,192.168.1.152:66

在下面的python脚本中,我想利用collections.Counter,regex和键值字典进行迭代,我的初始代码版本如下:

#!/usr/bin/env python



import os
from collections import Counter
import re
import urllib


def main():

    # read the content
    parse_content = urllib.urlopen(
        'https://bpaste.net/raw/477b79a86b42').read()

    count = Counter()



    for line in parse_content:
        line = line.rstrip()
        print count
        # r'(\d{1,3}\.){3}\d{1,3}'
        if re.search(r'/^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$', line):
            print line


if __name__ == '__main__':
    main()

我可以在两个序列上使用循环,获取ip并在第一个序列中存储然后在其他中求和,如果是,那么我应该如何编写逻辑,我对python伪代码的初步思考是:

conn_n = ['2', '5', '3']
ip_seq = ['192.168.1.97', '192.168.1.152', '192.168.1.108']
dict = {"conn_n":"ip_seq"}

for key,val in d.items():
      if line.startswith('['):
      print re.findall(r'/^([0-9]{1,3}\.){3}[0-9]{1,3}(\/([0-9]|[1-2][0-9]|3[0-2]))?$/', line)
      ....
      ....
      print("{} = {}".format(key, val))

如果我使用列表并循环遍历它或词典,我会非常感谢有关解决此问题的路径的一些帮助吗?

提前感谢您的帮助!

1 个答案:

答案 0 :(得分:0)

我认为你根本不需要使用正则表达式。只要注意你的线条的性质。我选择创建生成器,以便更容易理解正在发生的事情:

#!/usr/bin/env python2.7

import collections
import ssl    # See note in logfile_lines, below
import urllib

LOGFILE = r'https://bpaste.net/raw/477b79a86b42'

def logfile_lines(url):

    # NOTE: This business of `_create_unverified_context` is because I 
    # don't have py2.7 set up correctly w/r/t ssl. I normally use py3.
    # If you're stuck using 2.7, go ahead and get your certificate verification 
    # working properly!

    logfile = urllib.urlopen(url, context=ssl._create_unverified_context())

    for line in logfile:
        if ' |' in line:
            continue

        if not line[0].isdigit():
            continue

        yield line.strip()

def count_address_pairs(url):

    logiter = iter(logfile_lines(url))

    for count, address in zip(logiter, logiter):
        yield (int(count), address)

counts = collections.Counter()

for cnt, addr in count_address_pairs(LOGFILE):
    counts[addr] += cnt

print(counts)

输出看起来很简单:

$ ./test.py
Counter({'192.168.2.108': 138, '192.168.2.144': 87, '192.168.1.97': 84, '192.168.1.152': 66})