我想查找所有带有“相似”错误消息的日志,并计算每种日志的出现次数。问题是错误消息中经常有一些动态部分。
例如,给定错误消息,例如
"Didn't accept value 3 for parameter foo"
"Didn't accept value 6 for parameter bar"
"Could not open file 'my_file.json' because: it does not exist"
"Could not open file 'my_other_file.json' because: it is not
formatted correctly"
我希望能够统计这些日志的出现,以便最终得到如下输出:
"Didn't accept value * for parameter *" -- 2 counts
"Could not open file * because: it does not exist" -- 2 counts
编写正则表达式的问题在于,来自多个团队的日志消息格式多种多样。我不得不写几十个正则表达式来结束计数,而且我还会留下很长的未计数日志消息
是否有某种方法可以检测日志何时具有动态部分并进行汇总?
答案 0 :(得分:0)
你的意思是这样吗?
import re
logs = [
"Didn't accept value 3 for parameter foo",
"Didn't accept value 6 for parameter bar",
"Could not open file 'my_file.json' because: it does not exist",
"Could not open file 'my_other_file.json' because: it is not formatted correctly",
]
counts = {
"Didn't accept value * for parameter *": 0,
"Could not open file * because: *": 0
}
for log in logs:
s = re.search(r"Didn't accept value \d+ for parameter \w+", log)
if s:
counts["Didn't accept value * for parameter *"] += 1
continue
s = re.search(r"Could not open file '[^']+' because: \w+", log)
if s:
counts["Could not open file * because: *"] += 1
continue
print(counts)