我是自动化测试的新手。我遇到问题,我想从日志中选择json格式信息,然后在python中解析它们。原始日志如下:
2-14 11:49:23.869 D / me.ele.minimart.http.interceptor.HttpLogger(859):x分片:loc = 118.7234160,32.0320550 12-14 11:49:23.869 D / me.ele.minimart.http.interceptor.HttpLogger(859):主持人:stargate.ele.me 12-14 11:49:23.869 D / me.ele.minimart.http.interceptor.HttpLogger(859):连接:保持活动 12-14 11:49:23.869 D / me.ele.minimart.http.interceptor.HttpLogger(859):接受编码:gzip 12-14 11:49:23.869 D / me.ele.minimart.http.interceptor.HttpLogger(859):用户代理:okhttp / 3.5.0 12-14 11:49:23.869 D / me.ele.minimart.http.interceptor.HttpLogger(859): 12-14 11:49:23.869 D / me.ele.minimart.http.interceptor.HttpLogger(859):{ 12-14 11:49:23.869 D / me.ele.minimart.http.interceptor.HttpLogger(859):“ transactionId”:“ 4ac50bcb358d376d4719a413b31c4786”, 12-14 11:49:23.869 D / me.ele.minimart.http.interceptor.HttpLogger(859):“ commandType”:“ UNLOCK”, 12-14 11:49:23.869 D / me.ele.minimart.http.interceptor.HttpLogger(859):“ deviceId”:“ CD1103929”, 12-14 11:49:23.869 D / me.ele.minimart.http.interceptor.HttpLogger(859):“令牌”:“ CD1103929”, 12-14 11:49:23.869 D / me.ele.minimart.http.interceptor.HttpLogger(859):“ resultDetails”:“成功”, 12-14 11:49:23.869 D / me.ele.minimart.http.interceptor.HttpLogger(859):“ invokerType”:“ USEREND”, 12-14 11:49:23.869 D / me.ele.minimart.http.interceptor.HttpLogger(859):“ logisticsOrderCategory”:0, 12-14 11:49:23.869 D / me.ele.minimart.http.interceptor.HttpLogger(859):“ logisticsOrderId”:0, 12-14 11:49:23.869 D / me.ele.minimart.http.interceptor.HttpLogger(859):“ commandAt”:1544759360619, 12-14 11:49:23.869 D / me.ele.minimart.http.interceptor.HttpLogger(859):“ invokerId”:96944200, 12-14 11:49:23.869 D / me.ele.minimart.http.interceptor.HttpLogger(859):“ deviceUnlockTime”:162, 12-14 11:49:23.869 D / me.ele.minimart.http.interceptor.HttpLogger(859):“ logisticsOrderType”:0 12-14 11:49:23.869 D / me.ele.minimart.http.interceptor.HttpLogger(859):}
我尝试在Web regrex101上使用正则表达式:\{(?:[^\{\}]|\{(?:[^\{\}]|\{(?:[^\{\}]|\{(?:[^\{\}]|\{(?:[^\{\}]|\{(?:[^\{\}]|\{(?:[^\{\}]|\{(?:[^\{\}]|\{(?:[^\{\}]|\{(?:[^\{\}]|w+)*\})*\})*\})*\})*\})*\})*\})*\})*\})*\}
,并以csv格式导出,但是我得到了:
12-14 11:49:23.869 D / me.ele.minimart.http.interceptor.HttpLogger(859):“” transactionId“”:“”“ 4ac50bcb358d376d4719a413b31c4786”“, 12-14 11:49:23.869 D / me.ele.minimart.http.interceptor.HttpLogger(859):“” commandType“”:“” UNLOCK“”, 12-14 11:49:23.869 D / me.ele.minimart.http.interceptor.HttpLogger(859):“” deviceId“”:“” CD1103929“”, 12-14 11:49:23.869 D / me.ele.minimart.http.interceptor.HttpLogger(859):“”令牌“”:“” CD1103929“”, 12-14 11:49:23.869 D / me.ele.minimart.http.interceptor.HttpLogger(859):“” resultDetails“”:“”成功“”, 12-14 11:49:23.869 D / me.ele.minimart.http.interceptor.HttpLogger(859):“” invokerType“”:“” USEREND“”, 12-14 11:49:23.869 D / me.ele.minimart.http.interceptor.HttpLogger(859):“” logisticsOrderCategory“”:0, 12-14 11:49:23.869 D / me.ele.minimart.http.interceptor.HttpLogger(859):“” logisticsOrderId“”:0, 12-14 11:49:23.869 D / me.ele.minimart.http.interceptor.HttpLogger(859):“” commandAt“”:1544759360619, 12-14 11:49:23.869 D / me.ele.minimart.http.interceptor.HttpLogger(859):“” invokerId“”:96944200, 12-14 11:49:23.869 D / me.ele.minimart.http.interceptor.HttpLogger(859):“” deviceUnlockTime“”:162, 12-14 11:49:23.869 D / me.ele.minimart.http.interceptor.HttpLogger(859):“” logisticsOrderType“”:0 12-14 11:49:23.869 D / me.ele.minimart.http.interceptor.HttpLogger(859):}“
但是我真正想要的是这样的:
{ “ transactionId”:“ 4ac50bcb358d376d4719a413b31c4786”, “ commandType”:“ UNLOCK”, “ deviceId”:“ CD1103929”, “令牌”:“ CD1103929”, “ resultDetails”:“成功”, “ invokerType”:“ USEREND”, “ logisticsOrderCategory”:0, “ logisticsOrderId”:0, “ commandAt”:1544759360619, “ invokerId”:96944200, “ deviceUnlockTime”:162, “ logisticsOrderType”:0 }
删除无用的单词。所以我怎么能得到json格式的结果呢?在regrex表达式中可能会有一些错误。
非常感谢!
答案 0 :(得分:1)
我认为不需要使用正则表达式从日志中提取JSON片段。
import json
with open('origin.log') as f:
sj = ''
for l in f:
l = l.rstrip()
if l.endswith('{'):
sj = '{'
elif sj:
if l.endswith('}'):
sj += '\n}'
js = json.loads(sj)
print(js['transactionId'])
sj = ''
else:
sj += '\n' + l.split('):')[-1]