我正在分析日志文件中的数据。我的日志文件是这样的:
[2018-07-13 03:04:57] production.DEBUG: No problem MemId: 000MemId or CardNo
There is no staff information
MemId: 2956144 without the bird - file; mbs
There is no staff information
There is no staff information
[2018-07-13 03:06:07] production.DEBUG: No problem MemId: 00mem_id or CardNo
我想在熊猫中创建一个DataFrame。我的预期结果:
TimeStand Screen Level messenger
2018-07-13 03:04:57 production DEBUG No problem MemId...staff information
2018-07-13 03:06:07 production DEBUG No problem MemId: 00mem_id or CardNo
喜欢这个:
我考虑过使用正则表达式,但是我是Python的初学者。
答案 0 :(得分:0)
我已经编写了代码,您没有指定太多,但是您将了解如何使用正则表达式,并且可以对其进行操作。 还可以使用Google搜索str.strip剥离一些字符。
import re
import pandas as pd
st= '[2018-07-13 03:04:57] production.DEBUG: No problem MemId: 000MemId or CardNo There is no staff information MemId: 2956144 without the bird - file; mbs There is no staff information here is no staff information [2018-07-13 03:06:07] production.DEBUG: No problem MemId: 00mem_id or CardNo [2018...etc]'
timelist=re.findall('\[\w\S*\s\w*\S*]',st)
df=pd.DataFrame({'TimeStand': timelist})
screenlist=re.findall(r'\bproduction\b',st)
df['Screen']=screenlist
levellist=re.findall(r'\bDEBUG\b',st)
df['Level']=levellist
messengerlist=re.findall(r'\: .*?\[',st)
df['Messenger']=messengerlist
输出看起来像这样-
TimeStand Screen Level \
0 [2018-07-13 03:04:57] production DEBUG
1 [2018-07-13 03:06:07] production DEBUG
Messenger
0 : No problem MemId: 000MemId or CardNo There i...
1 : No problem MemId: 00mem_id or CardNo [
答案 1 :(得分:0)
我有这个解决方案。 我的代码是:
import pandas as pd
import json
import pyes # For documentation around pyes.es : https://pyes.readthedocs.org/en/latest/references/pyes.es.html
import requests
import numpy as np
import datetime
import inspect
import re
v = open(r"C:/laravel-2019-06-01.log","r",encoding='utf-8-sig')
st = v.read()
st = st + '[2018-07-14]'
st = st.replace('\n',' ')
timelist=re.findall('\d{4}[-/]\d{2}[-/]\d{2} \d{2}[:]\d{2}[:]\d{2}',st)
df=pd.DataFrame({'TimeStand': timelist})
screenlist=re.findall(r'\d{2}[:]\d{2}[:]\d{2}\].*?\.',st)
df['TimeStand'] = df['TimeStand'].str.strip('][')
df['Screen']=screenlist
df['Screen'] = df['Screen'].map(lambda x: str(x)[10:])
df['Screen'] = df['Screen'].map(lambda x: str(x)[:-1])
levellist=re.findall(r'\d{2}[:]\d{2}[:]\d{2}\].*?\..*?\:',st)
df['Level']=levellist
df['Level'] = df['Level'].map(lambda x: str(x)[21:])
df['Level'] = df['Level'].map(lambda x: str(x)[:-1])
messengerlist=re.findall(r'\d{2}[:]\d{2}[:]\d{2}\].*?\..*?\: .*?\[\d{4}[-/]\d{2}[-/]\d{2}',st)
df['Messenger']=messengerlist
df['Messenger'] = np.where(df['Level']=='ERROR',df['Messenger'].map(lambda x: str(x)[27:]),np.where(df['Level']=='DEBUG',df['Messenger'].map(lambda x: str(x)[27:]),np.where(df['Level']=='CRITICAL',df['Messenger'].map(lambda x: str(x)[30:]),np.where(df['Level']=='ALERT',df['Messenger'].map(lambda x: str(x)[27:]),np.where(df['Level']=='NOTICE',df['Messenger'].map(lambda x: str(x)[28:]),np.where(df['Level']=='INFO',df['Messenger'].map(lambda x: str(x)[26:]),np.where(df['Level']=='WARNING',df['Messenger'].map(lambda x: str(x)[29:]),df['Messenger'].map(lambda x: str(x)[31:]))))))))
df['Messenger'] = df['Messenger'].map(lambda x: str(x)[:-11])
print(df)
我希望该解决方案能够为需要的人提供帮助。 非常感谢
答案 2 :(得分:-1)
我建议开始阅读代码:
import pandas as pd
TimeStand, Screen, Level, messenger = []
log = open('log.txt', 'r')
for line in log:
if ....:
TimeStand.append(...)
elif ....:
Screen.append(...)
df = pd.DataFrame({'TimeStand': TimeStand, 'Screen': Screen, 'Level': Level, 'messenger': messenger