Question

我有多个日志文件，其中包含10000多行信息并且是Gzip压缩文件。我需要一种方法来快速解析每个日志文件以获取相关信息，然后根据所有日志文件中包含的信息显示统计信息。我目前使用gzip.open()递归打开每个.gz文件，然后通过原始解析器运行内容。

def parse(logfile):
    for line in logfile:
        if "REPORT" in line:
            info = line.split()
            username = info[2]
            area = info[4]
            # Put info into dicts/lists etc.
        elif "ERROR" in line:
            info = line.split()
            ...

def main(args):
    argdir = args[1]
    for currdir, subdirs, files in os.walk(argdir):
        for filename in files:
            with gzip.open(os.path.join(currdir, filename), "rt") as log:
                parse(log)
    # Create a report at the end: createreport()

有没有办法为每个文件优化此过程？我的计算机上每个文件目前需要大约28秒来完成每个.gz并且每个小优化都很重要。我尝试使用pypy，由于某种原因，处理文件需要2倍的时间。

快速.gz日志文件在Python中解析

0 个答案: