将具有特定键/值对的记录与jsonlines格式文件分开

时间:2019-03-28 22:53:54

标签: python json jsonlines

我必须分析jsonlines format中的日志。它们以压缩格式(.gz)存储。我需要从文件(这是有效的JSON记录)中读取每一行,并将具有匹配“事件”键的记录移至另一个文件。

借助jsonlines模块,我能够读取文件,但是只要任何JSON记录(或文件中的任何行)缺少“事件”键,它都会引发错误。并非文件中的所有JSON记录都会显示“事件”键。

以下是一个文件中的示例内容:

{"Product":"apple","event":"login","timestamp":"2018-09-27T17:35:55.835Z","version":2}
{"Product":"apple","timestamp":"2018-09-27T17:35:55.835Z","Id":"faf91826-ebc9-4242-996f-d52969bec2d5","version":2}
{"Product":"apple","event":"LandingPage","timestamp":"2018-09-27T17:14:22.998Z","Id":"88016b33-72d7-458e-8de8-f76241f4b681","version":2}
{"Product":"apple","event":"LandingPage","timestamp":"2018-09-27T17:38:55.835Z","version":2}
{"Product":"apple","event":"login","timestamp":"2018-09-27T17:37:55.835Z","version":2}
import jsonlines
import json
import os


with jsonlines.open('/Users/logfile') as reader:
    for obj in reader:
        try:
            #        my_json_dict = json.loads(obj)
            value1 = obj.get['event', 'default']
            print(value1)
            if value1 == 'login'
        except NameError as ee:
            print(type(ee))
            print(ee)

解析每天产生的大约1k jsonlines文件后,我的输出将包含三个包含事件特定json记录的文件:

login.jsonl (with "event":"login")   
LandingPage.jsonl (with "event":"LandingPage")   
original source file (with missing "event" key) 

0 个答案:

没有答案