我想解析这个(Using VBA Code how to export excel worksheets as image in Excel 2003?)日志文件,其中包含特定格式的数字和输出的输出。 IP旁边。我想获取IP并对总连接数进行总计,例如在下面的语法中:
[0;32m192.168.1.34 | SUCCESS | rc=0 >>
2
192.168.1.97
5
192.168.1.152
3
192.168.2.108
11
192.168.2.144
[0m
[0;32m192.168.1.18 | SUCCESS | rc=0 >>
2
192.168.1.97
3
192.168.1.152
14
192.168.2.108
7
192.168.2.144
[0m
[0;32m192.168.2.137 | SUCCESS | rc=0 >>
5
192.168.1.97
10
192.168.1.152
53
192.168.2.108
6
192.168.2.144
[0m
[0;32m192.168.1.96 | SUCCESS | rc=0 >>
脚本的输入应该作为文件读取,存储数量和IP,然后输出应该是IP&总之,对。例如192.168.97:84,192.168.1.152:66
在下面的python脚本中,我想利用collections.Counter,regex和键值字典进行迭代,我的初始代码版本如下:
#!/usr/bin/env python
import os
from collections import Counter
import re
import urllib
def main():
# read the content
parse_content = urllib.urlopen(
'https://bpaste.net/raw/477b79a86b42').read()
count = Counter()
for line in parse_content:
line = line.rstrip()
print count
# r'(\d{1,3}\.){3}\d{1,3}'
if re.search(r'/^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$', line):
print line
if __name__ == '__main__':
main()
我可以在两个序列上使用循环,获取ip并在第一个序列中存储然后在其他中求和,如果是,那么我应该如何编写逻辑,我对python伪代码的初步思考是:
conn_n = ['2', '5', '3']
ip_seq = ['192.168.1.97', '192.168.1.152', '192.168.1.108']
dict = {"conn_n":"ip_seq"}
for key,val in d.items():
if line.startswith('['):
print re.findall(r'/^([0-9]{1,3}\.){3}[0-9]{1,3}(\/([0-9]|[1-2][0-9]|3[0-2]))?$/', line)
....
....
print("{} = {}".format(key, val))
如果我使用列表并循环遍历它或词典,我会非常感谢有关解决此问题的路径的一些帮助吗?
提前感谢您的帮助!
答案 0 :(得分:0)
我认为你根本不需要使用正则表达式。只要注意你的线条的性质。我选择创建生成器,以便更容易理解正在发生的事情:
#!/usr/bin/env python2.7
import collections
import ssl # See note in logfile_lines, below
import urllib
LOGFILE = r'https://bpaste.net/raw/477b79a86b42'
def logfile_lines(url):
# NOTE: This business of `_create_unverified_context` is because I
# don't have py2.7 set up correctly w/r/t ssl. I normally use py3.
# If you're stuck using 2.7, go ahead and get your certificate verification
# working properly!
logfile = urllib.urlopen(url, context=ssl._create_unverified_context())
for line in logfile:
if ' |' in line:
continue
if not line[0].isdigit():
continue
yield line.strip()
def count_address_pairs(url):
logiter = iter(logfile_lines(url))
for count, address in zip(logiter, logiter):
yield (int(count), address)
counts = collections.Counter()
for cnt, addr in count_address_pairs(LOGFILE):
counts[addr] += cnt
print(counts)
输出看起来很简单:
$ ./test.py
Counter({'192.168.2.108': 138, '192.168.2.144': 87, '192.168.1.97': 84, '192.168.1.152': 66})