Question

我有

格式的日志消息

[2013-Mar-05 18:21:45.415053] (ThreadID) <Module name> [Logging level]    Message Desciption : This is the message.

我想以

的形式创建字典

{'time stamp': 2013-Mar-05 18:21:45.415053, 'ThreadId': 4139, 'Module name': ModuleA , 'Message Description': My Message, 'Message' : This is the message }

我尝试在白色空格处使用拆分来拆分日志消息，然后我可以选择令牌并制作列表。像这样：

for i in line1.split(" "):

这会给这样的令牌

['2013-Mar-05', '18:21:45.415053]', '(ThreadID)', '<Module name>', '[Logging level]',    'Message Desciption', ':', 'This is the message.']

然后选择令牌并输入所需的列表。

在这种情况下，有没有更好的方法来提取令牌。这里有一种模式，例如time stamp将位于[]括号中，threadId将位于()内，module name将位于<>内。我们可以利用此信息直接提取令牌吗？

Answer 1

使用正则表达式，希望这有帮助！

import re

string = '[2013-Mar-05 18:21:45.415053] (4444) <Module name> [Logging level]  Message Desciption : This is the message.'

regex = re.compile(r'\[(?P<timestamp>[^\]]*?)\] \((?P<threadid>[^\)]*?)\) \<(?P<modulename>[^\>]*?)\>[^:]*?\:(?P<message>.*?)$')

for match in regex.finditer(string):
    dict = {'timestamp': match.group("timestamp"), 'threadid': match.group("threadid"), 'modulename': match.group('modulename'), 'message': match.group('message')}

print dict

输出：

{'timestamp': '2013-Mar-05 18:21:45.415053', 'message': ' This is the message.', 'modulename': 'Module name', 'threadid': '4444'}

说明：我正在使用组标记正则表达式的部分内容，以便稍后在脚本中使用。有关详细信息，请参阅http://docs.python.org/2/library/re.html。基本上我从左到右穿过这条线，寻找分隔符[，＆lt;，（等等。

Answer 2

这是@Oli的一个非常相似的答案，但正则表达式更具可读性，我使用groupdict()所以不需要形成一个新的字典，因为它是由regexp创建的。日志字符串从左到右解析，消耗每个匹配。

fmt = re.compile(
      r'\[(?P<timestamp>.+?)\]\s+' # Save everything within [] to group timestamp
      r'\((?P<thread_id>.+?)\)\s+' # Save everything within () to group thread_id
      r'\<(?P<module_name>.+?)\>\s+' # Save everything within <> to group module_name
      r'\[(?P<log_level>.+?)\]\s+' # Save everything within [] to group to log_level
      r'(?P<message_desc>.+?)(\s:\s|$)' # Save everything before \s:\s or end of line to           group message_desc,
      r'(?P<message>.+$)?' # if there was a \s:\s, save everything after it to group   message. This last group is optional
      )

log = '[2013-Mar-05 18:21:45.415053] (4139) <ModuleA> [DEBUG]  Message Desciption : An example message!'

match = fmt.search(log)

print match.groupdict()

示例：

log = '[2013-Mar-05 18:21:45.415053] (4139) <ModuleA> [DEBUG]  Message Desciption : An       example message!'
match = fmt.search(log)

print match.groupdict() 
{'log_level': 'DEBUG',
 'message': 'An example message!',
 'module_name': 'ModuleA',
 'thread_id': '4139',
 'timestamp': '2013-Mar-05 18:21:45.415053'}

来自本答案评论的第一个测试字符串的示例

log = '[2013-Mar-05 18:21:45.415053] (0x7aa5e3a0) <Logger> [Info] Opened settings file : /usr/local/ABC/ABC/var/loggingSettings.ini'

match = fmt.search(log)

print match.groupdict()
{'log_level': 'Info',
 'message': '/usr/local/ABC/ABC/var/loggingSettings.ini',
 'message_desc': 'Opened settings file',
 'module_name': 'Logger',
 'thread_id': '0x7aa5e3a0',
 'timestamp': '2013-Mar-05 18:21:45.415053'}

来自本答案评论的第二个测试字符串示例：

log = '[2013-Mar-05 18:21:45.415053] (0x7aa5e3a0) <Logger> [Info] Creating a new settings file'

match = fmt.search(log)

print match.groupdict()
{'log_level': 'Info',
 'message': None,
 'message_desc': 'Creating a new settings file',
 'module_name': 'Logger',
 'thread_id': '0x7aa5e3a0',
 'timestamp': '2013-Mar-05 18:21:45.415053'}

编辑：已修复以使用OP的示例。

Answer 3

以下怎么样？（评论解释了发生了什么）

log = '[2013-Mar-05 18:21:45.415053] (ThreadID) <Module name> [Logging level]    Message Description : This is the message.'

# Define functions on how to proces the different kinds of tokens
time_stamp = logging_level = lambda x: x.strip('[ ]')
thread_ID = lambda x: x.strip('( )')
module_name = lambda x: x.strip('< >')
message_description = message = lambda x: x

# Names of the tokens used to make the dictionary keys
keys = ['time stamp', 'ThreadId',
        'Module name', 'Logging level',
        'Message Description', 'Message']
# Define functions on how to process the message
funcs = [time_stamp, thread_ID,
         module_name, logging_level,
         message_description, message]
# Define the tokens at which to split the message
split_on = [']', ')', '>', ']', ':']

msg_dict = {}

for i in range(len(split_on)):
    # Split up the log one token at a time
    temp, log = log.split(split_on[i], 1)
    # Process the token using the defined function
    msg_dict[keys[i]] = funcs[i](temp) 

msg_dict[keys[i]] = funcs[i](log) # Process the last token
print msg_dict

Answer 4

虽然在这种情况下使用re更简单，但是如果您不想使用它，则试试这个，

string = '[2013-Mar-05 18:21:45.415053] (ThreadID) <Module name> [Logging level]    Message Desciption : This is the message.'

# the main function, return the items between start and end.
def get_between(start, end, string):
    in_between = 0
    c_str = ''
    items = []
    indexes = []
    for i in range(len(string)):
        char = string[i]
        if char == start:
            if in_between == 0: indexes.append(i) # if starting bracket
            in_between += 1
        elif char == end:
            in_between -= 1
            if in_between == 0: indexes.append(i) # if ending bracket
        elif in_between > 0:
            c_str += char
        if in_between == 0 and c_str != '': # after ending bracket
            items.append(c_str)
            c_str = ''
    return items, indexes

# As both Time Stamp, and Logging Level are between []s,
# And as message comes after Logging Level,
data,last_indexes = get_between('[',']',string)
time_stamp, logging = data
# We only want the first item in the first list
thread_id = get_between('(',')',string)[0][0]
module = get_between('<','>',string)[0][0]

last = max(last_indexes)
# extracting the message    
message = ''.join(string[last+1:].split(':')[1:]).strip()

mydict = {'Time':time_stamp, 'Thread ID':thread_id,'Module':module,'Logging Level':logging,'Message':message}
print mydict

在这里，我们得到2个“分类器”之间的字符并与它们一起工作......

Answer 5

如果您具有一致的日志格式，为什么不将宏用于索引？

实施例

DATE = 0
TIME = 1
TID = 2
MODULE = 3
LOG_LVL = 4
MESSAGE = 5 (or more like 7)

log = ['2013-Mar-05', '18:21:45.415053]', '(ThreadID)', '<Module name>', '[Logging level]',    'Message Desciption', ':', 'This is the message.']

然后只使用log [DATE]或不是？最后，在使用基于索引的访问之前，在要组合在一起的块上使用“.join”。然后你可以填写你想要的词典。

它不像Oli的解决方案那样整洁，但它可以完成工作：）

Python将消息记录到令牌中

5 个答案: