Question

我一直致力于一个协助日志分析的程序。它使用正则表达式查找错误或失败消息，并将它们打印到新的.txt文件。但是，如果程序包括匹配的顶部和底部4行，那将会更有益。我无法弄清楚如何做到这一点！以下是现有计划的一部分：

def error_finder(filepath):
source = open(filepath, "r").readlines()
error_logs = set()
my_data = []
for line in source:
    line = line.strip()
    if re.search(exp, line):
        error_logs.add(line)

我假设需要在最后一行添加一些内容，但我一直在研究这个问题，要么我没有完全应用自己，要么就是无法理解。

对此有任何建议或帮助表示赞赏。

谢谢！

Answer 1

为什么选择python？

grep -C4 '^your_regex$' logfile > outfile.txt

Answer 2

一些意见：

我不确定为什么error_logs是一个集合而不是列表。
使用readlines()将读取内存中的整个文件，这对于大文件来说效率低下。您应该能够一次遍历文件一行。
exp（您用于re.search）未在任何地方定义，但我认为这是您代码中的其他位置。

无论如何，这里有完整的代码，可以在不读取内存中的整个文件的情况下执行您想要的操作。它还将保留输入行的顺序。

import re
from collections import deque

exp = '\d'
# matches numbers, change to what you need

def error_finder(filepath, context_lines = 4):
  source = open(filepath, 'r')
  error_logs = []

  buffer = deque(maxlen=context_lines)
  lines_after = 0

  for line in source:
    line = line.strip()
    if re.search(exp, line):
      # add previous lines first
      for prev_line in buffer:
        error_logs.append(prev_line)
      # clear the buffer
      buffer.clear()
      # add current line
      error_logs.append(line)
      # schedule lines that follow to be added too
      lines_after = context_lines
    elif lines_after > 0:
      # a line that matched the regex came not so long ago
      lines_after -= 1
      error_logs.append(line)
    else:
      buffer.append(line)

  # maybe do something with error_logs? I'll just return it
  return error_logs

Answer 3

我建议使用索引循环代替每个循环，试试这个：

error_logs = list()
for i in range(len(source)):
    line = source[i].strip()
    if re.search(exp, line):
        error_logs.append((line,i-4,i+4))

在这种情况下，您的错误日志将包含（'错误行'，行索引-4，行索引+4），因此您可以稍后从“源”获取这些行

使用Python 2.7.3在输出中包含周围的文本文件匹配行

3 个答案: