我有以下日志文件
*** 2018-09-14T12:36:39.560671+02:00 (DB_NAME)
*** SESSION ID:(12345) 2018-09-14T12:36:39.560750+02:00
*** CLIENT ID:() 2018-09-14T12:36:39.560774+02:00
*** SERVICE NAME:(DB_NAME) 2018-09-14T12:36:39.560798+02:00
*** MODULE NAME:(mod_name_action (TNS V1-V3)) 2018-09-14T12:36:39.560822+02:00
*** ACTION NAME:() 2018-09-14T12:36:39.560848+02:00
*** CLIENT DRIVER:() 2018-09-14T12:36:39.560875+02:00
*** CONTAINER ID:(1) 2018-09-14T12:36:39.560926+02:00
我想存储MODULE_NAME值,因此,请从此行中提取:
*** MODULE NAME:(mod_name_action (TNS V1-V3)) 2018-09-14T12:36:39.560822+02:00
就这样:
mod_name_action (TNS V1-V3)
我必须使用python做到这一点。我正在尝试类似的东西:
log_i=open(logname,"r")
for line_of_log in log_i:
#search the MODULE
module = "MODULE NAME:("
str_found_at = line_of_log.find(module)
if str_found_at != -1:
regex = r"MODULE NAME:([a-zA-Z]+)"
MODULE = re.findall(regex, line_of_log)
print "MODULE_A==>", MODULE
log_i.close()
但是,当然不行。
有人可以帮我吗?
谢谢。
答案 0 :(得分:0)
使用正则表达式。
演示:
import re
s = """*** 2018-09-14T12:36:39.560671+02:00 (DB_NAME)
*** SESSION ID:(12345) 2018-09-14T12:36:39.560750+02:00
*** CLIENT ID:() 2018-09-14T12:36:39.560774+02:00
*** SERVICE NAME:(DB_NAME) 2018-09-14T12:36:39.560798+02:00
*** MODULE NAME:(mod_name_action (TNS V1-V3)) 2018-09-14T12:36:39.560822+02:00
*** ACTION NAME:() 2018-09-14T12:36:39.560848+02:00
*** CLIENT DRIVER:() 2018-09-14T12:36:39.560875+02:00
*** CONTAINER ID:(1) 2018-09-14T12:36:39.560926+02:00"""
res = []
for line in s.splitlines():
m = re.search(r"(?<=MODULE NAME:\()(.*?)(?=\)\))", line)
if m:
res.append(m.group()+")")
print(res)
输出:
['mod_name_action (TNS V1-V3)']
答案 1 :(得分:0)
您可以不使用正则表达式来执行此操作。我将使用p
方法将您的日志数据放入行列表(保留换行符)中,以便我们像文件一样将其循环。
我们可以使用For i = 2 to n
If Int(x) <> Int(Cells(i,1).Value) Then
p = 0
End If
x = Cells(i, 1).Value 'Since we take the Hour, no need to trim the date off first
If Hour(x) >= 9 Then
p = p + 1
Select Case (p\OrdersinHour) 'Same as Int(p/OrdersinHour)
Case 0:
Cells(i,2).value = "Estimated window time 21:00 - 22:00"
Case 1:
Cells(i,2).value = "Estimated window time 22:00 - 23:00"
Case 2:
Cells(i,2).value = "Estimated window time 23:00 - 20:00"
Case 3:
Cells(i,2).value = "Estimated window time 00:00 - 01:00"
End Select
End IF
Next
查找包含“模块名称:”的行,然后只需要搜索该行的第一个'('和最后一个')',以便我们可以将子字符串切出包含名称。
.splitlines
输出
in
如果日志中只有一行“ MODULE NAME:”(模数名称:)行(或者,如果有多个,则只想打印第一个),则应在log_i = '''\
*** 2018-09-14T12:36:39.560671+02:00 (DB_NAME)
*** SESSION ID:(12345) 2018-09-14T12:36:39.560750+02:00
*** CLIENT ID:() 2018-09-14T12:36:39.560774+02:00
*** SERVICE NAME:(DB_NAME) 2018-09-14T12:36:39.560798+02:00
*** MODULE NAME:(mod_name_action (TNS V1-V3)) 2018-09-14T12:36:39.560822+02:00
*** ACTION NAME:() 2018-09-14T12:36:39.560848+02:00
*** CLIENT DRIVER:() 2018-09-14T12:36:39.560875+02:00
*** CONTAINER ID:(1) 2018-09-14T12:36:39.560926+02:00
'''.splitlines(True)
for line_of_log in log_i:
#search for the MODULE NAME line
if "MODULE NAME:" in line_of_log:
# Find the location of the first '('
start = line_of_log.index('(')
# Find the location of the last ')'
end = line_of_log.rindex(')')
modname = line_of_log[start+1:end]
print "MODULE_A==>", modname
语句后放置MODULE_A==> mod_name_action (TNS V1-V3)
这样您就不会浪费时间检查文件中的以下所有行。
答案 2 :(得分:0)
这不起作用,因为您的正则表达式模式不正确:模式'[a-zA-Z] +'与'_'和'-'之类的特殊字符不匹配。另外,如果要删除括号,则必须使用'\'转义字符将其包括在模式中。最后,不要使用
str_found_at = line_of_log.find(module)
您可以直接在python的字符串中搜索子字符串。 最后,我建议使用以下代码:
log_i=open(logname,"r")
for line_of_log in log_i:
#search the MODULE
module = "MODULE NAME:("
if module in line_of_log:
regex = r"MODULE NAME:\((.+)\)"
MODULE = re.findall(regex, line_of_log)
print "MODULE_A==>", MODULE[0]
log_i.close()