Question

我是python的新手，我遇到了一个我无法解决的严重问题。

我有一些结构相同的日志文件：

[timestamp] [level] [source] message

例如：

[Wed Oct 11 14:32:52 2000] [error] [client 127.0.0.1] error message

我需要用纯Python编写一个程序，它应该将这些日志文件合并到一个文件中，然后按时间戳对合并后的文件进行排序。在此操作之后，我希望将此结果（合并文件的内容）打印到STDOUT（控制台）。

我不明白如何做到这一点会有所帮助。这可能吗？

Answer 1

你可以这样做

import fileinput
import re
from time import strptime

f_names = ['1.log', '2.log'] # names of log files
lines = list(fileinput.input(f_names))
t_fmt = '%a %b %d %H:%M:%S %Y' # format of time stamps
t_pat = re.compile(r'\[(.+?)\]') # pattern to extract timestamp
for l in sorted(lines, key=lambda l: strptime(t_pat.search(l).group(1), t_fmt)):
    print l,

Answer 2

首先，您需要使用fileinput模块从多个文件中获取数据，例如：

data = fileinput.FileInput()
for line in data.readlines():
    print line

然后将所有线条打印在一起。您还想对排序关键字进行排序。

假设你的行以[2011-07-20 19:20:12]开头，你就是金色的，因为这种格式不需要在alphanum之外进行任何排序，所以：

data = fileinput.FileInput()
for line in sorted(data.readlines()):
    print line

然而，你需要做一些更复杂的事情：

def compareDates(line1, line2):
   # parse the date here into datetime objects
   NotImplemented
   # Then use those for the sorting
   return cmp(parseddate1, parseddate2)

data = fileinput.FileInput()
for line in sorted(data.readlines(), cmp=compareDates):
    print line

对于奖励积分，你甚至可以

data = fileinput.FileInput(openhook=fileinput.hook_compressed)

这将使您能够读取gzip压缩日志文件。

然后使用：

$ python yourscript.py access.log.1 access.log.*.gz

或类似。

Answer 3

至于关键排序功能：

def sort_key(line):
    return datetime.strptime(line.split(']')[0], '[%a %b %d %H:%M:%S %Y')

这应该用作key或sort的{{1}}参数，而不是sorted。这种方式更快。</ p>

哦，你应该

cmp

在你的代码中使这项工作。

Answer 4

将两个文件的行读入列表（它们现在将被合并），提供用户定义的比较函数，将时间戳转换为自纪元以来的秒数，使用用户定义的比较调用sort，将行写入合并文件...

def compare_func():
    # comparison code
    pass


lst = []

for line in open("file_1.log", "r"):
   lst.append(line)

for line in open("file_2.log", "r"):
   lst.append(line)

# create compare function from timestamp to epoch called compare_func

lst.sort(cmp=compare_func)  # this could be a lambda if it is simple enough

类似的东西应该这样做

Answer 5

在打印第一行之前，这里的所有其他答案都会读入所有日志中，这可能会非常慢，甚至在日志太大的情况下也会中断。

与上述解决方案一样，该解决方案使用正则表达式和strptime格式，但是它会“合并”日志。

这意味着您可以将的输出通过管道传递到“ head”或“ less”，并期望它是快速的。

import typing
import time
from dataclasses import dataclass


t_fmt = "%Y%m%d.%H%M%S.%f"      # format of time stamps
t_pat = re.compile(r"([^ ]+)")  # pattern to extract timestamp

def get_time(line, prev_t):
    # uses the prev time if the time isn't found
    res = t_pat.search(line)
    if not res:
        return prev_t
    try:
        cur = time.strptime(res.group(1), t_fmt)
    except ValueError:
        return prev_t   
    return cur

def print_sorted(files):
    @dataclass
    class FInfo:
        path: str
        fh: typing.TextIO
        cur_l = ""
        cur_t = None

        def __read(self):
            self.cur_l += self.fh.readline()
            if not self.cur_l:
                # eof found, set time so file is sorted last
                self.cur_t = time.localtime(time.time() + 86400)
            else:
                self.cur_t = get_time(self.cur_l, self.cur_t)

        def read(self):
            # clear out the current line, and read
            self.cur_l = ""
            self.__read()
            while self.cur_t is None:
                self.__read()

    finfos = []
    for f in files:
        try:
            fh = open(f, "r")
        except FileNotFoundError:
            continue
        fi = FInfo(f, fh)
        fi.read()
        finfos.append(fi)

    while True:
        # get file with first log entry
        fi = sorted(finfos, key=lambda x: x.cur_t)[0]
        if not fi.cur_l:
            break
        print(fi.cur_l, end="")
        fi.read()

在Python中合并和排序日志文件

5 个答案: