我需要解析如下所示的日志文件:
July 31 12:54:55 mycomputername updater[93309]: WARN Please send help I am stuck inside the internet
July 31 16:07:01 mycomputername process[11843]: ERROR Process failed
July 31 16:20:37 mycomputername dhcpclient[36371]: ERROR Unable to download more RAM
July 31 16:24:34 mycomputername updater[83956]: INFO Access permitted
July 31 16:43:19 mycomputername utility[31859]: ERROR Process failed
July 31 16:43:19 mycomputername CRON[31859]: ERROR: Failed to start CRON job due to hard partying.
我需要创建一个列表来存储将要搜索的所有模式(用户输入)。这个列表被命名为 error_patterns,最初它有一个模式“error”来过滤掉所有的 ERROR 日志。
def error_search(log_file):
read_line = True
error_patterns = ["error"]
error = input("What's the error? ").lower().split()
for word in error:
error_patterns.append(word)
return error_patterns
如果我查找“CRON Failed to start”错误(例如),输出将是:
['error', 'cron', 'failed', 'to', 'start']
现在,我的目标是逐行解析日志文件,并匹配所有这些单词。我有以下代码,但我在逻辑上很挣扎。 请建议最好的方法:
with open(log_file, mode = 'r', encoding='UTF-8') as f:
returned_errors = []
if read_line == True:
for log in f:
for match in re.finditer(str(error_patterns), log, re.S):
match_text = match.group()
returned_errors.append(match_text)
# print(match_text)
else:
data = f.read()
for match in re.finditer(str(error_patterns), data, re.S):
match_text = match.group()
returned_errors.append(match_text)
f.close()
我的函数的输出应该只返回包含最后一个日志行的行,因为它是唯一包含用户输入的所有单词的行。
July 31 16:43:19 mycomputername CRON[31859]: ERROR: Failed to start CRON job due to hard partying.
答案 0 :(得分:1)
尝试:
import re
from collections import Counter
# given this list of words as input
error_patterns = ['error', 'cron', 'failed', 'to', 'start']
#OR the words with a | character into a string
searchPattern = "|".join(error_patterns)
#wrap as a non-capturing group so you have valid regex as (?:error|cron|failed|to|start)
searchPattern = r'(?i)(?:' + searchPattern + ')'
# open file and read into lines
with open('log_file.txt', 'r', encoding='UTF-8') as f:
lines = f.read().splitlines() #remove newlines
# loop through each line
for line in lines:
#split line so we are using only the message part at the end
#this removes catching the 'CRON' from 'mycomputername CRON' in the match
message = line.rsplit(']: ')[-1]
#if there is something (anything) matched...
if re.search(searchPattern, message):
#only if each word from the error_patterns list is matched at least once
if len(Counter(re.findall(searchPattern, message)).keys()) == len(error_patterns):
#print the full line out
print(line)
输出:
July 31 16:43:19 mycomputername CRON[31859]: ERROR: Failed to start CRON job due to hard partying.