嗨,我需要解析文本文件中的以下数据并将其重新格式化为显示输出的方式的帮助,因此它基本上会统计状态并将其编号添加到新列中,并且还会删除重复的地址(如果它们具有相同的日期)和IP)
Sample Input Data to read
11/11/2015 9.9.9.9 30s success
11/11/2015 9.9.9.8 30s stuck
11/11/2015 9.9.9.9 30s Sync
11/11/2015 9.9.9.9 30s success
11/12/2015 9.9.9.9 30s success
11/12/2015 9.9.9.9 30s stuck
11/12/2015 9.9.9.9 30s stuck
11/12/2015 9.9.9.9 30s success
11/12/2015 9.9.9.8 30s success
11/12/2015 9.9.9.9 30s success
11/12/2015 9.9.9.9 30s stuck
11/12/2015 9.9.9.9 30s success
11/12/2015 9.9.9.9 30s Sync
11/12/2015 9.9.9.9 30s Sync
Output Data to print
11/11/2015 9.9.9.9 success 2
11/11/2015 9.9.9.8 stuck 1
11/11/2015 9.9.9.9 Sync 1
11/12/2015 9.9.9.9 success 4
11/12/2015 9.9.9.9 stuck 3
11/12/2015 9.9.9.9 Sync 2
11/12/2015 9.9.9.8 success 1
我尝试使用以下命令加载文件,但未正确将其重新格式化。
file=open('logfile.txt' , 'r')
contents=file.readlines()
for line in contents:
答案 0 :(得分:1)
您可以为此使用collections.Counter
。
from collections import Counter
data = ''' 11/11/2015 9.9.9.9 30s success
11/11/2015 9.9.9.8 30s stuck
11/11/2015 9.9.9.9 30s Sync
11/11/2015 9.9.9.9 30s success
11/12/2015 9.9.9.9 30s success
11/12/2015 9.9.9.9 30s stuck
11/12/2015 9.9.9.9 30s stuck
11/12/2015 9.9.9.9 30s success
11/12/2015 9.9.9.8 30s success
11/12/2015 9.9.9.9 30s success
11/12/2015 9.9.9.9 30s stuck
11/12/2015 9.9.9.9 30s success
11/12/2015 9.9.9.9 30s Sync
11/12/2015 9.9.9.9 30s Sync'''
print('\n'.join('%s %s' % (' '.join(k), v) for k, v in Counter(tuple(f for i, f in enumerate(l.split()) if i != 2) for l in data.split('\n')).items()))
这将输出:
11/11/2015 9.9.9.9 success 2
11/11/2015 9.9.9.8 stuck 1
11/11/2015 9.9.9.9 Sync 1
11/12/2015 9.9.9.9 success 4
11/12/2015 9.9.9.9 stuck 3
11/12/2015 9.9.9.8 success 1
11/12/2015 9.9.9.9 Sync 2
答案 1 :(得分:0)
使用re
模块中的defaultdict
和collections
:
data_in = """
11/11/2015 9.9.9.9 30s success
11/11/2015 9.9.9.8 30s stuck
11/11/2015 9.9.9.9 30s Sync
11/11/2015 9.9.9.9 30s success
11/12/2015 9.9.9.9 30s success
11/12/2015 9.9.9.9 30s stuck
11/12/2015 9.9.9.9 30s stuck
11/12/2015 9.9.9.9 30s success
11/12/2015 9.9.9.8 30s success
11/12/2015 9.9.9.9 30s success
11/12/2015 9.9.9.9 30s stuck
11/12/2015 9.9.9.9 30s success
11/12/2015 9.9.9.9 30s Sync
11/12/2015 9.9.9.9 30s Sync
"""
import re
from collections import defaultdict
groups = re.findall('([\d/]+)\s*([\d\.]+)\s*([\ds]+)\s*([a-zA-Z]+)', data_in)
d = defaultdict(int)
for g in groups:
d[(g[0], g[1], g[2], g[3])] += 1
for k, v in d.items():
s = ' '.join(i for i in k)
print(f'{s} {v}')
输出:
11/11/2015 9.9.9.9 30s success 2
11/11/2015 9.9.9.8 30s stuck 1
11/11/2015 9.9.9.9 30s Sync 1
11/12/2015 9.9.9.9 30s success 4
11/12/2015 9.9.9.9 30s stuck 3
11/12/2015 9.9.9.8 30s success 1
11/12/2015 9.9.9.9 30s Sync 2