Question

来自东京的问候。

让我解释一下我想用python 2.7实现的目标：

我在每一行都有一个带有 JSON Dict 的文件，这是一个捕获：

1 {"res":0, "res_message":"OK", "debug_info":{"id-info":"9089"}, "visits":[{"id":"237000080507750613","siteId":1551642,"startTime":1483217576324,"endTime":1483217696000,"clientIPs":["69.61.12.70"],"country"    :["United States"],"countryCode":["US"],"clientType":"Vulnerability Scanner","clientApplication":"Grabber","clientApplicationId":780,"httpVersion":"1.1","clientApplicationVersion":"null","userAgent":"Mozi    lla/5.0 CommonCrawler Node 3AEHGF7VNEKJUWOPKJJIJ7ODKPM4XXVZQUTHNWS5B2O5AEAGHIG4HVC42LLEUSO.CQYXO3ZFD.GB5RZ5EG2SRWW335PUSOSIVLZUXPCTJUGV2MDJGQJDJPE5UH.cdn0.common.crawl.zone","os":"","osVersion":"","suppor    tsCookies":false,"supportsJavaScript":false,"hits":1,"pageViews":0,"entryReferer":"","servedVia":["Ashburn,VA"],"securitySummary": {"api.threats.bot_access_control":1},"actions":[{"postData":"","requestResult":"api.request_result.req_blocked_security","isSecured":false,"responseTime":0,"thinkTime":0,"incidentId":"237000080507750613-304992946328    764549","threats":[{"securityRule":"api.threats.bot_access_control","alertLocation":"api.alert_location.alert_location_path","attackCodes":["200.0"],"securityRuleAction    ":"api.rule_action_type.rule_action_block"}]}]}, ...

2 {"res":0, "res_message":"OK", "debug_info":{"id-info":"9089"}, "visits":[{"id":"520000110618442601","siteId":1551642,"startTime":1482666233524,"endTime":1482666353000,"clientIPs":["93.175.201.18"],"countr    y":["Ukraine"],"countryCode":["UA"],"clientType":"Spam Bot","clientApplication":"DTS Agent","clientApplicationId":99,"httpVersion":"1.1","clientApplicationVersion":"null","userAgent":"Mozilla/4.0 (compati    ble; MSIE 5.0; Windows NT; DigExt; DTS Agent","os":"","osVersion":"","supportsCookies":false,"supportsJavaScript":false,"hits":1,"pageViews":0,"entryReferer":"","served    Via":["Warsaw, Poland"],"securitySummary":{"api.threats.bot_access_control":1},"actions":[{"postData":"","requestResult":"api.request_result.req_blocked_security","isSecured":false,"responseTime":2,"thinkTime":1,"incidentId":"520000110618442601-1233371267206742195","threats":[{"securityRule":"api.threats.bot_access_control","alertLocation":"api.alert_location.alert_location_path","attackCodes":["200.0"],"securityRuleAction":"api.rule_action_type.rule_action_block"}]}]}, ...

3 {"res":0, "res_message":"OK", "debug_info":{"id-info":"9089"}, "visits":[{"id":"520000110602830007","siteId":1551642,"startTime":1482429957001,"endTime":1482430077000,"clientIPs":["93.175.201.18"],"countr    y":["Ukraine"],"countryCode":["UA"],"clientType":"Spam Bot","clientApplication":"DTS Agent","clientApplicationId":99,"httpVersion":"1.1","clientApplicationVersion":"null","userAgent":"Mozilla/4.0 (compati    ble; MSIE 5.0; Windows NT; DigExt; DTS Agent","os":"","osVersion":"","supportsCookies":false,"supportsJavaScript":false,"hits":1,"pageViews":0,"entryReferer":"","served    Via":["Warsaw, Poland"],"securitySummary":{"api.threats.bot_access_control":1},"actions":[{"postData":"","requestResult":"api.request_result.req_blocked_security","isSecured":false","responseTime":4,"thinkTime":4,"incidentId":"520000110602830007-3073954101470953658","threats":[{"securityRule":"api.threats.bot_access_control","alertLocation":"api.alert_location.alert_location_path","attackCodes":["200.0"],"securityRuleAction":"api.rule_action_type.rule_action_block"}]}]}, ...

我尝试使用json.loads()继续整个文件，但没有任何成功。

这是我的代码

g = open('monthlyLogShort.txt', 'w')
with open("page.txt") as f:
         data = f.read()
         parse = json.loads(data)        # <-load the JSON dict
         field_list = parse["visits"]
         for fields in field_list:       # <-extract the the following field
                 print >> g , "visit_id=",(fields["id"]),",","src_country=",(fields["country"]),",", "event_timestamp=",(fields["startTime"]),",","src_ip=",(fields["clientIPs"]),",","dest_name=", rwdname,"    ,","dest_id=",(fields["siteId"]),",","signature=",(fields["securitySummary"])
g.close()

您可以想象，我只能使用此代码解析一行。进行整个文件的最佳（pythonic）方法是什么？

感谢您的阅读

Answer 1

整个文件不是有效的JSON，但您可以逐行解析：

with open("page.txt") as f:
    for line in f:
        obj = json.loads(line.split(" ", 1)[1])
        print(obj["visits"])

Answer 2

因为，我想出了这个解决方案的行数总是一样的：

g = open('monthlyLogShort.txt', 'w')
with open('page.txt','r') as f:
        data = f.readlines()
        countp = 0
        page = 0
        while countp < 10:
                parse = json.loads(data[page])  # load the JSON dict
                field_list = parse["visits"]
                for fields in field_list:       # extract the the following field
                        print >> g , "visit_id=",(fields["id"]),",","src_country=",(fields["country"]),",", "event_timestamp=",(fields["startTime"]),",","src_ip=",(fields["clientIPs"]),",","dest_name=", dname,",","dest_id=",(fields["siteId"]),",","signature=",(fields['securitySummary'])
                countp = countp + 1
                page = page + 1
        else:
                g.close()

它就像一个魅力。

Python：每行有一个JSON obj的文件

2 个答案: