我在熊猫df日志中有一个空格。
col
Sequential mode! HostOsCheck fails, so bye!
[c01][OK][HostOsCheck] Skip cji02 because it is DOWN
[c01][Stage 3] 2/3 checks passed
[c01][FAIL][HostOsCheck] Percentage working
[c01][FAIL][HostOsCheck] Percentage working
[c02][OK][ILOStatusCheck] Percentage of working
如果字符串的单词为[OK],则表示检查通过;如果字符串为[FAIL],则表示检查失败。
我想通过为df中的相同内容创建单独的cols来提取具有check类型(带有Check的名称),群集名称和状态(通过或失败)的日志,如下所示:
col cluster Status name
Sequential mode! HostOsCheck fails, so bye! c01 NA HostOsCheck
[c01][OK][HostOsCheck] Skip cji02 because it is DOWN c01 OK HostOsCheck
[c01][Stage 3] 2/3 checks passed c01 NA NA
[c01][FAIL][HostOsCheck] Percentage working c01 FAIL HostOsCheck
[c01][FAIL][HostOsCheck] Percentage working c01 FAIL HostOsCheck
[c02][OK][ILOStatusCheck] Percentage of working c02 OK ILOStatusCheck
字符串中可以包含任何日志消息,但是如果通过,则状态为[OK]或[FAIL],则状态为[]。支票的名称也位于[]
我知道我可以尝试使用正则表达式并使用col.str。所以尝试以下:
df['name'] = msg.str.extract(r'([\w{1,}Check])', expand = True)
但是我得到的不是完整的Check名称HostOsCheck等
0
0 f
1 f
2 S
3 f
4 f
状态相同:
df['status'] = msg.str.extract(r'([OK|FAIL])', expand = True)
0
0 O
1 O
2 O
3 O
4 O
编辑:
想通了。遗漏了[]的\
msg.str.extract(r'\[(\w{1,}Check)\]', expand = True)
答案 0 :(得分:0)
[c01][OK][HostOsCheck] Skip cji02 because it is DOWN
[c01][Stage 3] 2/3 checks passed
[c01][FAIL][HostOsCheck] Percentage working
[c01][FAIL][HostOsCheck] Percentage working
[c02][OK][ILOStatusCheck] Percentage of working
python代码
import pandas as pd
b=[]
with open('a.txt','r') as f:
while 1:
s=f.readlines()
for i in s:
a=i.split("]")
b.append([x.replace("[","") for x in a])
break
df=pd.DataFrame(b,columns=['Col','Status','Name','Reason'])
print(df)