需要从 Python 中的用户输入创建正则表达式模式

时间:2021-08-01 22:06:04

标签: python regex input iteration logfile

我需要解析如下所示的日志文件:

July 31 12:54:55 mycomputername updater[93309]: WARN Please send help I am stuck inside the internet

July 31 16:07:01 mycomputername process[11843]: ERROR Process failed

July 31 16:20:37 mycomputername dhcpclient[36371]: ERROR Unable to download more RAM

July 31 16:24:34 mycomputername updater[83956]: INFO Access permitted

July 31 16:43:19 mycomputername utility[31859]: ERROR Process failed

July 31 16:43:19 mycomputername CRON[31859]: ERROR: Failed to start CRON job due to hard partying.

我需要创建一个列表来存储将要搜索的所有模式(用户输入)。这个列表被命名为 error_patterns,最初它有一个模式“error”来过滤掉所有的 ERROR 日志。

def error_search(log_file):
   read_line = True
   error_patterns = ["error"]

   error = input("What's the error? ").lower().split()

   for word in error:
      error_patterns.append(word)
   return error_patterns

如果我查找“CRON Failed to start”错误(例如),输出将是:

['error', 'cron', 'failed', 'to', 'start']

现在,我的目标是逐行解析日志文件,并匹配所有这些单词。我有以下代码,但我在逻辑上很挣扎。 请建议最好的方法:

with open(log_file, mode = 'r', encoding='UTF-8') as f:
    returned_errors = []
    if read_line == True:
        for log in f:
            for match in re.finditer(str(error_patterns), log, re.S):
                match_text = match.group()
                returned_errors.append(match_text)
                # print(match_text)
    else:
        data = f.read()
        for match in re.finditer(str(error_patterns), data, re.S):
            match_text = match.group()
            returned_errors.append(match_text)
f.close()

我的函数的输出应该只返回包含最后一个日志行的行,因为它是唯一包含用户输入的所有单词的行。

July 31 16:43:19 mycomputername CRON[31859]: ERROR: Failed to start CRON job due to hard partying.

1 个答案:

答案 0 :(得分:1)

尝试:

import re
from collections import Counter

# given this list of words as input
error_patterns = ['error', 'cron', 'failed', 'to', 'start']

#OR the words with a | character into a string
searchPattern = "|".join(error_patterns)

#wrap as a non-capturing group so you have valid regex as (?:error|cron|failed|to|start)
searchPattern = r'(?i)(?:' + searchPattern + ')'  

# open file and read into lines
with open('log_file.txt', 'r', encoding='UTF-8') as f:
    lines = f.read().splitlines() #remove newlines

# loop through each line
for line in lines:
    #split line so we are using only the message part at the end
    #this removes catching the 'CRON' from 'mycomputername CRON' in the match
    message = line.rsplit(']: ')[-1]
    #if there is something (anything) matched...
    if re.search(searchPattern, message):
        #only if each word from the error_patterns list is matched at least once
        if len(Counter(re.findall(searchPattern, message)).keys()) == len(error_patterns):
            #print the full line out
            print(line)

输出:

July 31 16:43:19 mycomputername CRON[31859]: ERROR: Failed to start CRON job due to hard partying.