我有一个如下所示的日志文件:
lgProps
文件当然有更多的行,例如交替单行和具有多行json的行。
我想要实现的是拥有一个读取文件的文件,每当有一个包含json的行时,就会自动转储到一行。
所以它会像:
>>> 2017-08-02 08:51:45 +0200 [INFO] from com.sun.metro.assembler in application-siaServiceImplPort-context-362552 - MASM0007: No application metro.xml configuration file found.
>>> 2017-08-02 08:53:06 +0200 [INFO] from application in application-akka.actor.default-dispatcher-362046 - LOG_EVENT: {
"event" : "sxxxxxdd",
"ts" : "2017-xx
"svc" : "dxx.tlc-1",
"rexxxt" : {
"ts" : "2017-xxxx2:00",
"xx" : "73478c0f-dc70-46b7-a388-d12f7b8aa91e",
"xxxx" : "/xxx/xxx",
"xxx" : "POST",
"user_agent" : "xxx/6.2.1 xxxx/7.38.0 xxx/7.0xx16-1~xxx+8.1",
"user_id" : 39,
"xxx_ip" : "xxxx.1",
"xxxx" : "xxxxx",
"xx" : "xx",
"app_id" : "d4da4385a8204be2949ed62323231443",
"axxe" : "POxxkout"
},
"operation" : {
"scxe" : "checkout",
"rxxxlt" : {
"xxxus" : 2x0
}
},
"xx" : {
"xxx_id" : "CHTO06MLKXP9N",
"xxx_attributes" : {
"xx" : "2017xx6+02:00",
"date_xxxxx" : "2xx7-08xx53:06+02:00",
"xus" : "WAxING",
"dexxion" : "numx0",
"chaxxmount" : 2,
"chaxx_start" : "20x8xx+02:00",
"charge_max_count" : 1,
"merchant" : {
"xxx" : "xxxx",
"xxx" : "xxxxxxx",
"xx" : "xx-x xxxxxl.",
"logo" : "httxxxff0/258xxxjpeg",
"account_type" : "B"
},
"xx_xxx" : "xxxx",
"xxxx_xxx_url" : "https://xxx.xxx.xxx-pay.xx/xxx",
"xxx" : "xxxx",
"xxx" : "xx://dp.xx/uxx10/xxxx"
}
},
"cxx" : "xxxx"
}
我已经尝试过使用python,以下内容:
infile =" /hoxxxx/application.log"
important = [] keep_phrases =" LOG_EVENT"
>>> 2017-08-02 08:51:45 +0200 [INFO] from com.sun.metro.assembler in application-siaServiceImplPort-context-362552 - MASM0007: No application metro.xml configuration file found.
>>> 2017-08-02 08:53:06 +0200 [INFO] from application in application-akka.actor.default-dispatcher-362046 - LOG_EVENT: {the json here in 1 line}
但这是回归线,但当然它并不了解json完成的地方......
有任何帮助吗? 感谢
答案 0 :(得分:0)
你可以尝试使用正则表达式和json
正则表达式:
import re
with open(infile) as f:
text = f.read()
print re.sub(r'\n([^>])', r'\1', text)
输出:
>>> 2017-08-02 08:51:45 +0200 [INFO] from com.sun.metro.assembler in application-siaServiceImplPort-context-362552 - MASM0007: No application metro.xml configuration file found.
>>> 2017-08-02 08:53:06 +0200 [INFO] from application in application-akka.actor.default-dispatcher-362046 - LOG_EVENT: {"event" : "sxxxxxdd","ts" : "2017-xx"svc" : "dxx.tlc-1","rexxxt" : {"ts" : "2017-xxxx2:00","xx" : "73478c0f-dc70-46b7-a388-d12f7b8aa91e","xxxx" : "/xxx/xxx","xxx" : "POST","user_agent" : "xxx/6.2.1 xxxx/7.38.0 xxx/7.0xx16-1~xxx+8.1","user_id" : 39,"xxx_ip" : "xxxx.1","xxxx" : "xxxxx","xx" : "xx","app_id" : "d4da4385a8204be2949ed62323231443","axxe" : "POxxkout"},"operation" : {"scxe" : "checkout","rxxxlt" : {"xxxus" : 2x0}},"xx" : {"xxx_id" : "CHTO06MLKXP9N","xxx_attributes" : {"xx" : "2017xx6+02:00","date_xxxxx" : "2xx7-08xx53:06+02:00","xus" : "WAxING","dexxion" : "numx0","chaxxmount" : 2,"chaxx_start" : "20x8xx+02:00","charge_max_count" : 1,"merchant" : {"xxx" : "xxxx","xxx" : "xxxxxxx","xx" : "xx-x xxxxxl.","logo" : "httxxxff0/258xxxjpeg","account_type" : "B"},"xx_xxx" : "xxxx","xxxx_xxx_url" : "https://xxx.xxx.xxx-pay.xx/xxx","xxx" : "xxxx","xxx" : "xx://dp.xx/uxx10/xxxx"}},"cxx" : "xxxx"}
如果你想将jsons作为python对象,你也可以这样做:
import json
text2 = re.sub(r'\n([^>])', r'\1', text)
js = [json.loads(x) for x in re.findall(r'{.*}', text2)]