使用python使用正则表达式读取多行日志

时间:2019-03-06 07:26:19

标签: regex python-3.x

我想从日志文件中选择执行的查询。具体来说,一个例子看起来像这样:

2019-01-10 10:33:21 +07 dvdrentalLOG: statement:  SELECT last_update 
    From public.actor
2019-03-06 14:07:06 +07 dvdrentalLOG:  statement: SELECT film_id, title
    FROM public.film
    WHERE film_id = 1

我想使用循环获取查询。所需的输出:

query1 : SELECT last_update From public.actor
query2 : SELECT film_id, title FROM public.film WHERE film_id = 1

我尝试过:

import re
def parseFile(filepath):
    line=[]
    with open(filepath,'r') as log:
        regex = re.compile(r'(\d{4}-\d{2}-\d{2})(.*)',re.MULTILINE|re.DOTALL)
        for line in log:
            date = regex.findall(line)
            if date == []:
                print()
            else:
                print(date)

filepath = 'text.txt'
parseFile(filepath)

output:
 [('2019-01-10', ' 10:33:21 +07 dvdrentalLOG: statement:  SELECT last_update \n')]
 [('2019-03-06', ' 14:07:06 +07 dvdrentalLOG:  statement: SELECT film_id, title\n')]

输出未选择所有查询。我该怎么办?

2 个答案:

答案 0 :(得分:1)

您一次只处理一行(通过for line in log:循环),因此您的正则表达式一次仅适用于一行。它无法跨行匹配,因为您一次没有给它多行来匹配。

您可以改为通过log.read()读取整个文件,然后在其上调用.findall

答案 1 :(得分:0)

您可以像这样修改代码(在解析文件之前需要读取整个文件,如果像在代码中那样逐行读取,则正则表达式将只能逐行解析,并且永远无法选择整个SQL查询(分成几行):

T(n) = T(n/2) + T(n/4) + T(n/8)

输出:

import re
def parseFile(filepath):
    line=[]
    with open(filepath,'r') as log:
        regex = re.compile(r'(\d{4}-\d{2}-\d{2})(.*?)(?=\d{4}-\d{2}-\d{2}|$)',re.MULTILINE|re.DOTALL)
        lines = re.sub('\n|\s{2,}',' ',log.read())#.replace('\n', '')
        date = regex.findall(lines)
        if date == []:
          print()
        else:
          print(date)

filepath = 'query.log'
parseFile(filepath)

此处详细说明了使用的正则表达式(使用正向查找来限制与[('2019-01-10', ' 10:33:21 +07 dvdrentalLOG: statement: SELECT last_update From public.actor '), ('2019-03-06', ' 14:07:06 +07 dvdrentalLOG: statement: SELECT film_id, title FROM public.film WHERE film_id = 1 ')] 匹配的字符数):https://regex101.com/r/nE0omm/1/

.*?