我想从日志文件中选择执行的查询。具体来说,一个例子看起来像这样:
2019-01-10 10:33:21 +07 dvdrentalLOG: statement: SELECT last_update
From public.actor
2019-03-06 14:07:06 +07 dvdrentalLOG: statement: SELECT film_id, title
FROM public.film
WHERE film_id = 1
我想使用循环获取查询。所需的输出:
query1 : SELECT last_update From public.actor
query2 : SELECT film_id, title FROM public.film WHERE film_id = 1
我尝试过:
import re
def parseFile(filepath):
line=[]
with open(filepath,'r') as log:
regex = re.compile(r'(\d{4}-\d{2}-\d{2})(.*)',re.MULTILINE|re.DOTALL)
for line in log:
date = regex.findall(line)
if date == []:
print()
else:
print(date)
filepath = 'text.txt'
parseFile(filepath)
output:
[('2019-01-10', ' 10:33:21 +07 dvdrentalLOG: statement: SELECT last_update \n')]
[('2019-03-06', ' 14:07:06 +07 dvdrentalLOG: statement: SELECT film_id, title\n')]
输出未选择所有查询。我该怎么办?
答案 0 :(得分:1)
您一次只处理一行(通过for line in log:
循环),因此您的正则表达式一次仅适用于一行。它无法跨行匹配,因为您一次没有给它多行来匹配。
您可以改为通过log.read()
读取整个文件,然后在其上调用.findall
。
答案 1 :(得分:0)
您可以像这样修改代码(在解析文件之前需要读取整个文件,如果像在代码中那样逐行读取,则正则表达式将只能逐行解析,并且永远无法选择整个SQL查询(分成几行):
T(n) = T(n/2) + T(n/4) + T(n/8)
输出:
import re
def parseFile(filepath):
line=[]
with open(filepath,'r') as log:
regex = re.compile(r'(\d{4}-\d{2}-\d{2})(.*?)(?=\d{4}-\d{2}-\d{2}|$)',re.MULTILINE|re.DOTALL)
lines = re.sub('\n|\s{2,}',' ',log.read())#.replace('\n', '')
date = regex.findall(lines)
if date == []:
print()
else:
print(date)
filepath = 'query.log'
parseFile(filepath)
此处详细说明了使用的正则表达式(使用正向查找来限制与[('2019-01-10', ' 10:33:21 +07 dvdrentalLOG: statement: SELECT last_update From public.actor '), ('2019-03-06', ' 14:07:06 +07 dvdrentalLOG: statement: SELECT film_id, title FROM public.film WHERE film_id = 1 ')]
匹配的字符数):https://regex101.com/r/nE0omm/1/
.*?