使用python,dict或list读取文本文件并格式化数据

时间:2018-07-14 06:32:36

标签: python list dictionary

嗨,我需要解析文本文件中的以下数据并将其重新格式化为显示输出的方式的帮助,因此它基本上会统计状态并将其编号添加到新列中,并且还会删除重复的地址(如果它们具有相同的日期)和IP)

Sample Input Data to read


    11/11/2015              9.9.9.9   30s        success
    11/11/2015              9.9.9.8   30s        stuck
    11/11/2015              9.9.9.9   30s        Sync
    11/11/2015              9.9.9.9   30s        success
    11/12/2015              9.9.9.9   30s        success
    11/12/2015              9.9.9.9   30s        stuck
    11/12/2015              9.9.9.9   30s        stuck
    11/12/2015              9.9.9.9   30s        success
    11/12/2015              9.9.9.8   30s        success
    11/12/2015              9.9.9.9   30s        success
    11/12/2015              9.9.9.9   30s        stuck
    11/12/2015              9.9.9.9   30s        success
    11/12/2015              9.9.9.9   30s        Sync
    11/12/2015              9.9.9.9   30s        Sync

Output Data to print



11/11/2015              9.9.9.9   success         2
11/11/2015              9.9.9.8   stuck    1
11/11/2015              9.9.9.9   Sync         1
11/12/2015              9.9.9.9   success         4
11/12/2015              9.9.9.9   stuck    3
11/12/2015              9.9.9.9   Sync         2
11/12/2015              9.9.9.8   success         1

我尝试使用以下命令加载文件,但未正确将其重新格式化。

file=open('logfile.txt' , 'r')
contents=file.readlines()

for line in contents:

2 个答案:

答案 0 :(得分:1)

您可以为此使用collections.Counter

from collections import Counter
data = '''    11/11/2015              9.9.9.9   30s        success
    11/11/2015              9.9.9.8   30s        stuck
    11/11/2015              9.9.9.9   30s        Sync
    11/11/2015              9.9.9.9   30s        success
    11/12/2015              9.9.9.9   30s        success
    11/12/2015              9.9.9.9   30s        stuck
    11/12/2015              9.9.9.9   30s        stuck
    11/12/2015              9.9.9.9   30s        success
    11/12/2015              9.9.9.8   30s        success
    11/12/2015              9.9.9.9   30s        success
    11/12/2015              9.9.9.9   30s        stuck
    11/12/2015              9.9.9.9   30s        success
    11/12/2015              9.9.9.9   30s        Sync
    11/12/2015              9.9.9.9   30s        Sync'''
print('\n'.join('%s %s' % (' '.join(k), v) for k, v in Counter(tuple(f for i, f in enumerate(l.split()) if i != 2) for l in data.split('\n')).items()))

这将输出:

11/11/2015 9.9.9.9 success 2
11/11/2015 9.9.9.8 stuck 1
11/11/2015 9.9.9.9 Sync 1
11/12/2015 9.9.9.9 success 4
11/12/2015 9.9.9.9 stuck 3
11/12/2015 9.9.9.8 success 1
11/12/2015 9.9.9.9 Sync 2

答案 1 :(得分:0)

使用re模块中的defaultdictcollections

data_in = """
    11/11/2015              9.9.9.9   30s        success
    11/11/2015              9.9.9.8   30s        stuck
    11/11/2015              9.9.9.9   30s        Sync
    11/11/2015              9.9.9.9   30s        success
    11/12/2015              9.9.9.9   30s        success
    11/12/2015              9.9.9.9   30s        stuck
    11/12/2015              9.9.9.9   30s        stuck
    11/12/2015              9.9.9.9   30s        success
    11/12/2015              9.9.9.8   30s        success
    11/12/2015              9.9.9.9   30s        success
    11/12/2015              9.9.9.9   30s        stuck
    11/12/2015              9.9.9.9   30s        success
    11/12/2015              9.9.9.9   30s        Sync
    11/12/2015              9.9.9.9   30s        Sync
"""

import re
from collections import defaultdict

groups = re.findall('([\d/]+)\s*([\d\.]+)\s*([\ds]+)\s*([a-zA-Z]+)', data_in)
d = defaultdict(int)
for g in groups:
    d[(g[0], g[1], g[2], g[3])] += 1

for k, v in d.items():
    s = ' '.join(i for i in k)
    print(f'{s} {v}')

输出:

11/11/2015 9.9.9.9 30s success 2
11/11/2015 9.9.9.8 30s stuck 1
11/11/2015 9.9.9.9 30s Sync 1
11/12/2015 9.9.9.9 30s success 4
11/12/2015 9.9.9.9 30s stuck 3
11/12/2015 9.9.9.8 30s success 1
11/12/2015 9.9.9.9 30s Sync 2