我有catalina日志:
oct 21, 2016 12:32:13 AM org.wso2.carbon.identity.sso.agent.saml.SSOAgentHttpSessionListener sessionCreated
WARNING: HTTP Session created without LoggedInSessionBean
oct 21, 2016 3:03:20 AM com.sun.jersey.spi.container.ContainerResponse logException
SEVERE: Mapped exception to response: 500 (Internal Server Error)
javax.ws.rs.WebApplicationException
at ais.api.rest.rdss.Resource.lookAT(Resource.java:22)
at sun.reflect.GeneratedMethodAccessor3019.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
我尝试在python中解析它。我的问题是我不知道日志中有多少行。最少2行。我尝试从文件读取,当第一行以j,m,s,o等开始时,它意味着它是第一行日志,因为这是几个月的第一个字母。但我不知道如何继续。当我停止阅读线条?当下一行将以这些字母之一开头时?但我是怎么做到的?
import datetime
import re
SPACE = r'\s'
TIME = r'(?P<time>.*?M)'
PATH = r'(?P<path>.*?\S)'
METHOD = r'(?P<method>.*?\S)'
REQUEST = r'(?P<request>.*)'
TYPE = r'(?P<type>.*?\:)'
REGEX = TIME+SPACE+PATH+SPACE+METHOD+SPACE+TYPE+SPACE+REQUEST
def parser(log_line):
match = re.search(REGEX,log_line)
return ( (match.group('time'),
match.group('path'),
match.group('method'),
match.group('type'),
match.group('request')
)
)
db = MySQLdb.connect(host="localhost", user="myuser", passwd="mypsswd", db="Database")
with db:
cursor = db.cursor()
with open("Mylog.log","rw") as f:
for line in f:
if (line.startswith('j')) or (line.startswith('f')) or (line.startswith('m')) or (line.startswith('a')) or (line.startswith('s')) or (line.startswith('o')) or (line.startswith('n')) or (line.startswith('d')) :
logLine = line
result = parser(logLine)
sql = ("INSERT INTO ..... ")
data = (result[0])
cursor.execute(sql, data)
f.close()
db.close()
我最好的想法是一次只读两行。但这意味着丢弃所有其他数据。必须有更好的方法。
我想要读取这样的行:
1.line - oct 21, 2016 12:32:13 AM org.wso2.carbon.identity.sso.agent.saml.SSOAgentHttpSessionListener sessionCreated WARNING: HTTP Session created without LoggedInSessionBean
2.line - oct 21, 2016 3:03:20 AM com.sun.jersey.spi.container.ContainerResponse logException SEVERE: Mapped exception to response: 500 (Internal Server Error) javax.ws.rs.WebApplicationException at ais.api.rest.rdss.Resource.lookAT(Resource.java:22) at sun.reflect.GeneratedMethodAccessor3019.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl java:43)
3.line - oct 21, 2016 12:32:13 AM org.wso2.carbon.identity.sso.agent.saml.SSOAgentHttpSessionListener sessionCreated WARNING: HTTP Session created without LoggedInSessionBean
所以我想在行开始时使用datetime开始读取(这没问题)。问题是当下一行以datetime开始时我想要停止读取。
答案 0 :(得分:0)
这可能是你想要的。
我从生成器中的日志中读取行,以便我可以确定它们是日期时间行还是其他行。此外,重要的是,我可以在日志文件中标记已到达文件结尾。
在程序的主循环中,当我得到日期时间行时,我开始在列表中累积行。我第一次看到日期时间线,如果它不是空的,我会将其打印出来。由于程序在文件结束时会累积一条完整的行,我也安排在那一点打印累积的行。
import re
a_date, other, EOF = 0,1,2
def One_line():
with open('caroline.txt') as caroline:
for line in caroline:
line = line.strip()
m = re.match(r'[a-z]{3}\s+[0-9]{1,2},\s+[0-9]{4}\s+[0-9]{1,2}:[0-9]{2}:[0-9]{2}\s+[AP]M', line, re.I)
if m:
yield a_date, line
else:
yield other, line
yield EOF, ''
complete_line = []
for kind, content in One_line():
if kind in [a_date, EOF]:
if complete_line:
print (' '.join(complete_line ))
complete_line = [content]
else:
complete_line.append(content)
输出:
oct 21, 2016 12:32:13 AM org.wso2.carbon.identity.sso.agent.saml.SSOAgentHttpSessionListener sessionCreated WARNING: HTTP Session created without LoggedInSessionBean
oct 21, 2016 3:03:20 AM com.sun.jersey.spi.container.ContainerResponse logException SEVERE: Mapped exception to response: 500 (Internal Server Error) javax.ws.rs.WebApplicationException at ais.api.rest.rdss.Resource.lookAT(Resource.java:22) at sun.reflect.GeneratedMethodAccessor3019.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)