如何从以下脚本日志中获取一些详细信息。
Input.txt
#Date: 04-Jul-2020 01:55:54
SL|INFO|-----sql_query------
Update Table1 set sts='Process data--Started' where batch_id=30;
-----sql_query_end------
#Date: 04-Jul-2020 01:55:54
--
#Date: 04-Jul-2020 01:55:54
SL|INFO|-----sql_query------
Update Table2 set fm_sts='Process data--Started' where batch_id=30;
-----sql_query_end------
#Date: 04-Jul-2020 01:55:54
--
#Date: 04-Jul-2020 02:08:14
SL|INFO|-----sql_query------
Update Table1 set sts='Process data--Complete' where batch_id=30;
-----sql_query_end------
#Date: 04-Jul-2020 02:08:14
--
#Date: 04-Jul-2020 02:08:14
SL|INFO|-----sql_query------
Update Table2 set fm_sts='Process data--Complete' where batch_id=30;
-----sql_query_end------
#Date: 04-Jul-2020 02:08:15
必需的输出
Batch_Id 30,Process data--Started at 04-Jul-2020 01:55:54,Process data--Completed at 04-Jul-2020 02:08:14
强文本我尝试过的操作:
from itertools import groupby
with open('input.txt') as f_input:
data = [list(g) for k, g in groupby(f_input, lambda x: not x.startswith("Pulling Keys--Started")) if k]
data = [''.join(x) for x in data]
print (data)
请让我知道要获得所需的输出我必须遵循的步骤。
答案 0 :(得分:0)
我们在日志行中循环:
import re
with open('input.txt') as f_input:
lines = f_input.readlines()
start_date = None
end_date = None
DATE_HEADER = '#Date: '
BATCH_ID_FORMAT = r'batch_id=(\d+)'
batch_id = None
for l in lines:
if l.startswith(DATE_HEADER):
parsed_date = l.replace(DATE_HEADER, '').strip()
if not start_date:
start_date = parsed_date
end_date = parsed_date
else:
if not batch_id:
m = re.search(BATCH_ID_FORMAT, l)
if m:
batch_id = m.group().replace('batch_id=', '')
# Batch_Id 30, Process data--Started at 04-Jul-2020 01:55:54, Process data--Completed at 04-Jul-2020 02:08:15
print(f'Batch_Id {batch_id}, Process data--Started at {start_date}, Process data--Completed at {end_date}')
答案 1 :(得分:0)
处理每个批次的日志,对它们进行分类并根据batch_id
将它们分组,从每个组中的第一个批次开始,从最后一个批次结束,并打印出来。
>>> from itertools import groupby
>>> import re
>>>
>>> with open('input.txt') as f_input:
>>> data = f_input.read()
>>>
>>> fields = re.findall(r'#Date: (\S* \S*).*?Process data--(Started|Complete)'
r'.*?batch_id=(\d+);.*?#Date:', data, re.DOTALL)
>>>
>>> key = lambda t: t[2]
>>> for batch_id,grp in groupby(sorted(fields, key=key), key):
... start, *_, end = grp
... print (f'Batch_Id {batch_id},Process data--Started at {start[1]},'
f'Process data--Completed at {end[3]}')
...
Batch_Id 30,Process data--Started at Started,Process data--Completed at 04-Jul-2020 02:08:15