(python)找到匹配的日志

时间:2014-02-04 16:07:51

标签: python pattern-matching conditional

我有一个包含登录/注销详细信息的元组。我正在尝试匹配相应的登录和注销行。我想首先匹配一条包含“登录”的行,检索用户名,然后搜索与“注销”和用户名相匹配的下一行。

log_lines =[('2014-01-28 16:54:58', 'LOGON', 'jane', 'machinename'),
('2014-01-28 17:50:18', 'LOGOFF', 'jane', 'machinename'),
('2014-01-28 19:53:02', 'LOGON', 'skip', 'machinename'),
('2014-01-28 19:54:12', 'LOGOFF', 'skip', 'machinename'),
('2014-01-29 09:41:52', 'LOGON', 'jim', 'machinename'),
('2014-01-29 09:42:45', 'LOGOFF', 'jim', 'machinename'),
('2014-01-29 11:59:20', 'LOGON', 'skip', 'machinename'),
('2014-01-29 12:00:52', 'LOGOFF', 'skip', 'machinename')]

for logon in log_lines:
    if logon[1] == 'LOGON':
        name = logon[2]
        print name
        print logon
        for logoff in log_lines:
            if logoff[1] == 'LOGOFF' and logoff[2] == name
            print logoff

我不确定嵌套的if语句是否可行。

5 个答案:

答案 0 :(得分:0)

首先,您的登录[0]会返回日期。您需要使用logon [1]来检索LOGON或LOGOFF。然后根据您的情况,检索您需要调用logon [3]

的名称

答案 1 :(得分:0)

你的算法并不可怕。您可以使用索引减少一点。如:

for i in xrange(len(log_lines)):
    if log_lines[i][0] == 'LOGON':
        name = logon[1]
        for j in xrange(i,len(log_lines)):
            if log_lines[j][0] == 'LOGOFF' and loglines[j][1] == name:
                print log_lines[j]

这样做可以将算法运行时间平均缩短一半。请注意,内循环从下一行开始,而不是从头开始。

答案 2 :(得分:0)

从下一行开始,尝试使用nextlog_lines的一部分:

for i, line in enumerate(log_lines):
    if line[1] == 'LOGON':
        found = next(j for j,search in enumerate(log_lines[i+1:],i+1) 
            if search[1] == 'LOGOFF' and line[2] == search[2])
        print('found {} logoff match at index {}'.format(line[2],found))

输出:

found jane logoff match at index 1
found skip logoff match at index 3
found jim logoff match at index 5
found skip logoff match at index 7

这有效地开始了下一行的搜索,而不是迭代整个列表,寻找'LOGOFF'(并在找到匹配后立即停止)。 next提供了一些灵活性,因为如果生成器表达式用尽而没有找到匹配项,您可以提供默认值。

即。

found = next((j for j,search in enumerate(log_lines[i+1:],i+1) 
            if search[1] == 'LOGOFF' and line[2] == search[2]), None)

如果我们位于列表末尾并且用户尚未注销,我们会返回None而不是错误。

请注意,此方法会多次处理同一用户登录/注销。你的算法不能很好地处理这个问题!

答案 3 :(得分:0)

使用切片:

for l in log_lines:
    if l[1] == 'LOGON':
        start = log_lines.index(l)+1
        for item in log_lines[start:]:
            if (l[2]==item[2]) and (item[1]=='LOGOFF'):
                print l[2],"found log on and log off"

输出:

jane found log on and log off
skip found log on and log off
skip found log on and log off
jim found log on and log off
skip found log on and log off

答案 4 :(得分:0)

嵌套循环方法意味着算法为O(N ^ 2),即使内部起始索引更有效。下面是一个平均O(N)方法的示例,该方法不使用嵌套循环。

它还尝试处理一些不匹配事务的情况,假设用户的登录必须在该用户再次登录之前再次注销该用户。

log_lines =[('2014-01-28 16:54:58', 'LOGON', 'jane', 'machinename'),
('2014-01-28 17:50:18', 'LOGOFF', 'jane', 'machinename'),
('2014-01-28 19:53:02', 'LOGON', 'skip', 'machinename'),
('2014-01-28 19:54:12', 'LOGOFF', 'skip', 'machinename'),
('2014-01-29 09:41:52', 'LOGON', 'jim', 'machinename'),
('2014-01-29 09:42:45', 'LOGOFF', 'jim', 'machinename'),
('2014-01-29 11:59:20', 'LOGON', 'skip', 'machinename'),
('2014-01-29 12:00:52', 'LOGOFF', 'skip', 'machinename'),
# Following are made up, weird logs
('2014-01-29 12:00:52', 'LOGOFF', 'dooz', 'machinename'),
('2014-01-29 12:00:52', 'LOGOFF', 'booz', 'machinename'),
('2014-01-29 12:00:52', 'LOGON', 'fooz', 'machinename'),]

from pprint import pprint

logged_in = {}
transactions_matched = []
transactions_weird = []
for line in log_lines:
    action = line[1]
    user = line[2]
    if action == 'LOGON':
        if user not in logged_in:
            logged_in[user] = line
        else: # Abnormal case 1: LOGON again when the user is already LOGON
            transactions_weird.append(logged_in.pop(user))
            logged_in[user] = line
    elif action == 'LOGOFF':
        if user in logged_in:
            transactions_matched.append((logged_in.pop(user), line))
        else: # Abnormal case 2: LOGOFF when the user is never LOGIN yet
            transactions_weird.append(line)

# Dangling log-in actions, considered as abnormal
transactions_weird.extend(logged_in.values())          

print 'Matched:'
pprint(transactions_matched)
print 'Weird:'
pprint(transactions_weird)

输出:

Matched:
[(('2014-01-28 16:54:58', 'LOGON', 'jane', 'machinename'),
  ('2014-01-28 17:50:18', 'LOGOFF', 'jane', 'machinename')),
 (('2014-01-28 19:53:02', 'LOGON', 'skip', 'machinename'),
  ('2014-01-28 19:54:12', 'LOGOFF', 'skip', 'machinename')),
 (('2014-01-29 09:41:52', 'LOGON', 'jim', 'machinename'),
  ('2014-01-29 09:42:45', 'LOGOFF', 'jim', 'machinename')),
 (('2014-01-29 11:59:20', 'LOGON', 'skip', 'machinename'),
  ('2014-01-29 12:00:52', 'LOGOFF', 'skip', 'machinename'))]
Weird:
[('2014-01-29 12:00:52', 'LOGOFF', 'dooz', 'machinename'),
 ('2014-01-29 12:00:52', 'LOGOFF', 'booz', 'machinename'),
 ('2014-01-29 12:00:52', 'LOGON', 'fooz', 'machinename')]