我有
格式的日志消息[2013-Mar-05 18:21:45.415053] (ThreadID) <Module name> [Logging level] Message Desciption : This is the message.
我想以
的形式创建字典{'time stamp': 2013-Mar-05 18:21:45.415053, 'ThreadId': 4139, 'Module name': ModuleA , 'Message Description': My Message, 'Message' : This is the message }
我尝试在白色空格处使用拆分来拆分日志消息,然后我可以选择令牌并制作列表。像这样:
for i in line1.split(" "):
这会给这样的令牌
['2013-Mar-05', '18:21:45.415053]', '(ThreadID)', '<Module name>', '[Logging level]', 'Message Desciption', ':', 'This is the message.']
然后选择令牌并输入所需的列表。
在这种情况下,有没有更好的方法来提取令牌。这里有一种模式,例如time stamp
将位于[]
括号中,threadId
将位于()
内,module name
将位于<>
内。
我们可以利用此信息直接提取令牌吗?
答案 0 :(得分:1)
使用正则表达式,希望这有帮助!
import re
string = '[2013-Mar-05 18:21:45.415053] (4444) <Module name> [Logging level] Message Desciption : This is the message.'
regex = re.compile(r'\[(?P<timestamp>[^\]]*?)\] \((?P<threadid>[^\)]*?)\) \<(?P<modulename>[^\>]*?)\>[^:]*?\:(?P<message>.*?)$')
for match in regex.finditer(string):
dict = {'timestamp': match.group("timestamp"), 'threadid': match.group("threadid"), 'modulename': match.group('modulename'), 'message': match.group('message')}
print dict
输出:
{'timestamp': '2013-Mar-05 18:21:45.415053', 'message': ' This is the message.', 'modulename': 'Module name', 'threadid': '4444'}
说明:我正在使用组标记正则表达式的部分内容,以便稍后在脚本中使用。有关详细信息,请参阅http://docs.python.org/2/library/re.html。基本上我从左到右穿过这条线,寻找分隔符[,&lt;,(等等。
答案 1 :(得分:1)
这是@Oli的一个非常相似的答案,但正则表达式更具可读性,我使用groupdict()
所以不需要形成一个新的字典,因为它是由regexp创建的。日志字符串从左到右解析,消耗每个匹配。
fmt = re.compile(
r'\[(?P<timestamp>.+?)\]\s+' # Save everything within [] to group timestamp
r'\((?P<thread_id>.+?)\)\s+' # Save everything within () to group thread_id
r'\<(?P<module_name>.+?)\>\s+' # Save everything within <> to group module_name
r'\[(?P<log_level>.+?)\]\s+' # Save everything within [] to group to log_level
r'(?P<message_desc>.+?)(\s:\s|$)' # Save everything before \s:\s or end of line to group message_desc,
r'(?P<message>.+$)?' # if there was a \s:\s, save everything after it to group message. This last group is optional
)
log = '[2013-Mar-05 18:21:45.415053] (4139) <ModuleA> [DEBUG] Message Desciption : An example message!'
match = fmt.search(log)
print match.groupdict()
示例:
log = '[2013-Mar-05 18:21:45.415053] (4139) <ModuleA> [DEBUG] Message Desciption : An example message!'
match = fmt.search(log)
print match.groupdict()
{'log_level': 'DEBUG',
'message': 'An example message!',
'module_name': 'ModuleA',
'thread_id': '4139',
'timestamp': '2013-Mar-05 18:21:45.415053'}
来自本答案评论的第一个测试字符串的示例
log = '[2013-Mar-05 18:21:45.415053] (0x7aa5e3a0) <Logger> [Info] Opened settings file : /usr/local/ABC/ABC/var/loggingSettings.ini'
match = fmt.search(log)
print match.groupdict()
{'log_level': 'Info',
'message': '/usr/local/ABC/ABC/var/loggingSettings.ini',
'message_desc': 'Opened settings file',
'module_name': 'Logger',
'thread_id': '0x7aa5e3a0',
'timestamp': '2013-Mar-05 18:21:45.415053'}
来自本答案评论的第二个测试字符串示例:
log = '[2013-Mar-05 18:21:45.415053] (0x7aa5e3a0) <Logger> [Info] Creating a new settings file'
match = fmt.search(log)
print match.groupdict()
{'log_level': 'Info',
'message': None,
'message_desc': 'Creating a new settings file',
'module_name': 'Logger',
'thread_id': '0x7aa5e3a0',
'timestamp': '2013-Mar-05 18:21:45.415053'}
编辑:已修复以使用OP的示例。
答案 2 :(得分:0)
以下怎么样? (评论解释了发生了什么)
log = '[2013-Mar-05 18:21:45.415053] (ThreadID) <Module name> [Logging level] Message Description : This is the message.'
# Define functions on how to proces the different kinds of tokens
time_stamp = logging_level = lambda x: x.strip('[ ]')
thread_ID = lambda x: x.strip('( )')
module_name = lambda x: x.strip('< >')
message_description = message = lambda x: x
# Names of the tokens used to make the dictionary keys
keys = ['time stamp', 'ThreadId',
'Module name', 'Logging level',
'Message Description', 'Message']
# Define functions on how to process the message
funcs = [time_stamp, thread_ID,
module_name, logging_level,
message_description, message]
# Define the tokens at which to split the message
split_on = [']', ')', '>', ']', ':']
msg_dict = {}
for i in range(len(split_on)):
# Split up the log one token at a time
temp, log = log.split(split_on[i], 1)
# Process the token using the defined function
msg_dict[keys[i]] = funcs[i](temp)
msg_dict[keys[i]] = funcs[i](log) # Process the last token
print msg_dict
答案 3 :(得分:0)
虽然在这种情况下使用re更简单,但是如果您不想使用它,则 试试这个,
string = '[2013-Mar-05 18:21:45.415053] (ThreadID) <Module name> [Logging level] Message Desciption : This is the message.'
# the main function, return the items between start and end.
def get_between(start, end, string):
in_between = 0
c_str = ''
items = []
indexes = []
for i in range(len(string)):
char = string[i]
if char == start:
if in_between == 0: indexes.append(i) # if starting bracket
in_between += 1
elif char == end:
in_between -= 1
if in_between == 0: indexes.append(i) # if ending bracket
elif in_between > 0:
c_str += char
if in_between == 0 and c_str != '': # after ending bracket
items.append(c_str)
c_str = ''
return items, indexes
# As both Time Stamp, and Logging Level are between []s,
# And as message comes after Logging Level,
data,last_indexes = get_between('[',']',string)
time_stamp, logging = data
# We only want the first item in the first list
thread_id = get_between('(',')',string)[0][0]
module = get_between('<','>',string)[0][0]
last = max(last_indexes)
# extracting the message
message = ''.join(string[last+1:].split(':')[1:]).strip()
mydict = {'Time':time_stamp, 'Thread ID':thread_id,'Module':module,'Logging Level':logging,'Message':message}
print mydict
在这里,我们得到2个“分类器”之间的字符并与它们一起工作......
答案 4 :(得分:0)
如果您具有一致的日志格式,为什么不将宏用于索引?
实施例
DATE = 0
TIME = 1
TID = 2
MODULE = 3
LOG_LVL = 4
MESSAGE = 5 (or more like 7)
log = ['2013-Mar-05', '18:21:45.415053]', '(ThreadID)', '<Module name>', '[Logging level]', 'Message Desciption', ':', 'This is the message.']
然后只使用log [DATE]或不是?最后,在使用基于索引的访问之前,在要组合在一起的块上使用“.join”。然后你可以填写你想要的词典。
它不像Oli的解决方案那样整洁,但它可以完成工作:)