我必须分析jsonlines format中的日志。它们以压缩格式(.gz)存储。我需要从文件(这是有效的JSON记录)中读取每一行,并将具有匹配“事件”键的记录移至另一个文件。
借助jsonlines模块,我能够读取文件,但是只要任何JSON记录(或文件中的任何行)缺少“事件”键,它都会引发错误。并非文件中的所有JSON记录都会显示“事件”键。
以下是一个文件中的示例内容:
{"Product":"apple","event":"login","timestamp":"2018-09-27T17:35:55.835Z","version":2}
{"Product":"apple","timestamp":"2018-09-27T17:35:55.835Z","Id":"faf91826-ebc9-4242-996f-d52969bec2d5","version":2}
{"Product":"apple","event":"LandingPage","timestamp":"2018-09-27T17:14:22.998Z","Id":"88016b33-72d7-458e-8de8-f76241f4b681","version":2}
{"Product":"apple","event":"LandingPage","timestamp":"2018-09-27T17:38:55.835Z","version":2}
{"Product":"apple","event":"login","timestamp":"2018-09-27T17:37:55.835Z","version":2}
import jsonlines
import json
import os
with jsonlines.open('/Users/logfile') as reader:
for obj in reader:
try:
# my_json_dict = json.loads(obj)
value1 = obj.get['event', 'default']
print(value1)
if value1 == 'login'
except NameError as ee:
print(type(ee))
print(ee)
解析每天产生的大约1k jsonlines文件后,我的输出将包含三个包含事件特定json记录的文件:
login.jsonl (with "event":"login")
LandingPage.jsonl (with "event":"LandingPage")
original source file (with missing "event" key)