Question

我试图解析日志文件并提取某些捕获组，例如时间戳，用户名等。执行以下代码时，结果是一个列表元素，其中元组嵌套在其中作为捕获组（或搜索结果）。基本上，我很好奇为什么我要在一个捕获组中得到一个'\ n'字符，我不希望在那儿出现。

我尝试修改正则表达式模式，但无法解决问题。

import re

with open('obis1-query.log') as myfile:  
    StartTime = []
    myfile = myfile.read()
    mysearch = re.findall('(?P<datetime>\d+-\d+-\d+T\d+:\d+:\d+.\d+-05:00).\s.\w+.\s.\w+:\d.\s.+ecid:\s[A-Za-z\d,:-]+.\s.sik:\s\w+.\s.tid:\s\w+.\s.messageid:\s\w+-\d+.\s.requestid:\s\w+.\s.(?P<sessionid>sessionid:\s\w+).\s.(?P<username>username:\s\w+).\s#+\s\[\[\s-+\sSQL\sRequest,\s(?P<logreqhash>logical\srequest\shash:\n?\w+)', myfile)

if mysearch != None:        
    StartTime.append(mysearch)  
    print(StartTime)

输出如下：

[[('2019-06-12T09:14:54.947-05:00', 'sessionid: bf710000', 'username: 
kadaniel', 'logical request hash:\n83bf7e6f'), ('2019-06-12T09:14:55.343- 
05:00', 'sessionid: bf710000', 'username: kadaniel', 'logical request 
hash:\n8e45939b'), ('2019-06-12T09:14:55.362-05:00', 'sessionid: 
bf710000', 'username: kadaniel', 'logical request hash:\n4496de01'),

我只想从结果中删除“逻辑请求哈希：”和“以下数字”（在最后一种情况下为4496de01）之间的“ \ n”字符。

Answer 1

请考虑使用\n删除.replace("\n", "")字符

Answer 2

您正在将 entire 文件读入一个字符串，然后搜索该字符串。该文件（以及字符串）包含要匹配的'\ n'实例。

考虑使用

for line in myfile.readlines():
    # Search line for regex

一次解析一行，这本质上将省略换行符。

为什么在匹配的正则表达式模式结果中出现“ \ n”字符？

2 个答案: