Question

我必须分析jsonlines format中的日志。它们以压缩格式（.gz）存储。我需要从文件（这是有效的JSON记录）中读取每一行，并将具有匹配“事件”键的记录移至另一个文件。

借助jsonlines模块，我能够读取文件，但是只要任何JSON记录（或文件中的任何行）缺少“事件”键，它都会引发错误。并非文件中的所有JSON记录都会显示“事件”键。

以下是一个文件中的示例内容：

{"Product":"apple","event":"login","timestamp":"2018-09-27T17:35:55.835Z","version":2}
{"Product":"apple","timestamp":"2018-09-27T17:35:55.835Z","Id":"faf91826-ebc9-4242-996f-d52969bec2d5","version":2}
{"Product":"apple","event":"LandingPage","timestamp":"2018-09-27T17:14:22.998Z","Id":"88016b33-72d7-458e-8de8-f76241f4b681","version":2}
{"Product":"apple","event":"LandingPage","timestamp":"2018-09-27T17:38:55.835Z","version":2}
{"Product":"apple","event":"login","timestamp":"2018-09-27T17:37:55.835Z","version":2}

import jsonlines
import json
import os


with jsonlines.open('/Users/logfile') as reader:
    for obj in reader:
        try:
            #        my_json_dict = json.loads(obj)
            value1 = obj.get['event', 'default']
            print(value1)
            if value1 == 'login'
        except NameError as ee:
            print(type(ee))
            print(ee)

解析每天产生的大约1k jsonlines文件后，我的输出将包含三个包含事件特定json记录的文件：

login.jsonl (with "event":"login")   
LandingPage.jsonl (with "event":"LandingPage")   
original source file (with missing "event" key)

将具有特定键/值对的记录与jsonlines格式文件分开

0 个答案: