如何在python中解析多行catalina日志 - 正则表达式

时间:2018-03-03 12:39:01

标签: python regex parsing catalina

我有catalina日志:

oct 21, 2016 12:32:13 AM org.wso2.carbon.identity.sso.agent.saml.SSOAgentHttpSessionListener sessionCreated
WARNING: HTTP Session created without LoggedInSessionBean
oct 21, 2016 3:03:20 AM com.sun.jersey.spi.container.ContainerResponse logException
SEVERE: Mapped exception to response: 500 (Internal Server Error)
javax.ws.rs.WebApplicationException
    at ais.api.rest.rdss.Resource.lookAT(Resource.java:22)
    at sun.reflect.GeneratedMethodAccessor3019.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

我尝试在python中解析它。我的问题是我不知道日志中有多少行。最少2行。我尝试从文件读取,当第一行以j,m,s,o等开始时,它意味着它是第一行日志,因为这是几个月的第一个字母。但我不知道如何继续。当我停止阅读线条?当下一行将以这些字母之一开头时?但我是怎么做到的?

import datetime
import re

SPACE = r'\s'
TIME = r'(?P<time>.*?M)'
PATH = r'(?P<path>.*?\S)'
METHOD = r'(?P<method>.*?\S)'
REQUEST = r'(?P<request>.*)'
TYPE = r'(?P<type>.*?\:)'

REGEX = TIME+SPACE+PATH+SPACE+METHOD+SPACE+TYPE+SPACE+REQUEST

def parser(log_line):
  match = re.search(REGEX,log_line)
    return ( (match.group('time'),
          match.group('path'), 
                              match.group('method'),
                              match.group('type'),
                              match.group('request')
                             )
                           )

db = MySQLdb.connect(host="localhost", user="myuser", passwd="mypsswd", db="Database")

with db:
  cursor = db.cursor()


    with open("Mylog.log","rw") as f:
        for line in f:

          if (line.startswith('j')) or (line.startswith('f')) or (line.startswith('m')) or (line.startswith('a')) or (line.startswith('s')) or (line.startswith('o')) or (line.startswith('n')) or (line.startswith('d')) :

          logLine = line
          result = parser(logLine)

                sql = ("INSERT INTO ..... ")
                data = (result[0])
                cursor.execute(sql, data)

f.close()
db.close()

我最好的想法是一次只读两行。但这意味着丢弃所有其他数据。必须有更好的方法。

我想要读取这样的行: 1.line - oct 21, 2016 12:32:13 AM org.wso2.carbon.identity.sso.agent.saml.SSOAgentHttpSessionListener sessionCreated WARNING: HTTP Session created without LoggedInSessionBean

2.line - oct 21, 2016 3:03:20 AM com.sun.jersey.spi.container.ContainerResponse logException SEVERE: Mapped exception to response: 500 (Internal Server Error) javax.ws.rs.WebApplicationException at ais.api.rest.rdss.Resource.lookAT(Resource.java:22) at sun.reflect.GeneratedMethodAccessor3019.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl java:43)

3.line - oct 21, 2016 12:32:13 AM org.wso2.carbon.identity.sso.agent.saml.SSOAgentHttpSessionListener sessionCreated WARNING: HTTP Session created without LoggedInSessionBean

所以我想在行开始时使用datetime开始读取(这没问题)。问题是当下一行以datetime开始时我想要停止读取。

1 个答案:

答案 0 :(得分:0)

这可能是你想要的。

我从生成器中的日志中读取行,以便我可以确定它们是日期时间行还是其他行。此外,重要的是,我可以在日志文件中标记已到达文件结尾。

在程序的主循环中,当我得到日期时间行时,我开始在列表中累积行。我第一次看到日期时间线,如果它不是空的,我会将其打印出来。由于程序在文件结束时会累积一条完整的行,我也安排在那一点打印累积的行。

import re

a_date, other, EOF = 0,1,2

def One_line():
    with open('caroline.txt') as caroline:
        for line in caroline:
            line = line.strip()
            m = re.match(r'[a-z]{3}\s+[0-9]{1,2},\s+[0-9]{4}\s+[0-9]{1,2}:[0-9]{2}:[0-9]{2}\s+[AP]M', line, re.I)
            if m:
                yield a_date, line
            else:
                yield other, line
    yield EOF, ''

complete_line = []
for kind, content in One_line():
    if kind in [a_date, EOF]:
        if complete_line:
            print (' '.join(complete_line ))
        complete_line = [content]
    else:
        complete_line.append(content)

输出:

oct 21, 2016 12:32:13 AM org.wso2.carbon.identity.sso.agent.saml.SSOAgentHttpSessionListener sessionCreated WARNING: HTTP Session created without LoggedInSessionBean
oct 21, 2016 3:03:20 AM com.sun.jersey.spi.container.ContainerResponse logException SEVERE: Mapped exception to response: 500 (Internal Server Error) javax.ws.rs.WebApplicationException at ais.api.rest.rdss.Resource.lookAT(Resource.java:22) at sun.reflect.GeneratedMethodAccessor3019.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)