读取,替换和写入大型JSON文件

时间:2019-03-21 14:13:07

标签: python json

我想做的是加载一个JSON文件(包含许多对象),遍历每个对象,并用一些东西替换破折号,在这种情况下,仅输入字符串“ TEST”以查看其是否有效。

get_file = open("ntp-2019-03-13T1600", "r")
json_data=""

for obj in get_file:
    json_data = json_data + json.dumps(obj).replace("-", "TEST")

get_file.close()

new_file = open("formatted_ntp/zzz", "w+")
new_file.write(json.loads(json_data))
new_file.close()

运行此代码,出现此错误:

>     JSONDecodeError                           Traceback (most recent call last)
>     <ipython-input-26-cf175a001140> in <module>()
>          30 
>          31 new_file = open("formatted_ntp/zzz", "w+")
>     ---> 32 new_file.write(json.loads(json_data))
>          33 new_file.close()
>          34 
>     
>     ~/anaconda3/lib/python3.6/json/__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant,
> object_pairs_hook, **kw)
>         352             parse_int is None and parse_float is None and
>         353             parse_constant is None and object_pairs_hook is None and not kw):
>     --> 354         return _default_decoder.decode(s)
>         355     if cls is None:
>         356         cls = JSONDecoder
>     
>     ~/anaconda3/lib/python3.6/json/decoder.py in decode(self, s, _w)
>         340         end = _w(s, end).end()
>         341         if end != len(s):
>     --> 342             raise JSONDecodeError("Extra data", s, end)
>         343         return obj
>         344 
>     
>     JSONDecodeError: Extra data: line 1 column 348 (char 347)

现在,如果我在json_data=""上方的for循环中拍摄了json_data = json_data + json.dumps(obj).replace("-", "TEST"),那么新格式化的文件就成功写入了,但是只有第一个对象!该文件大约有100000个对象,我需要对所有对象执行相同的操作。

编辑:对象具有以下形式:

{"af":4,"dst_name":"pool.ntp.org","from":"2.183.50.198","fw":4960,"group_id":2048605,"lts":-1,"msm_id":2048605,"msm_name":"Ntp","prb_id":33714,"proto":"UDP","result":[{"error":"name resolution failed: non-recoverable failure in name resolution (1)"}],"timestamp":1552493066,"ttr":5623.624915,"type":"ntp"}
{"af":4,"dst_addr":"193.0.0.229","dst_name":"193.0.0.229","from":"2.183.50.198","fw":4960,"group_id":2048606,"li":"no","lts":-1,"mode":"server","msm_id":2048606,"msm_name":"Ntp","poll":8,"prb_id":33714,"precision":0.0000038147,"proto":"UDP","ref-id":"GPS","ref-ts":3761485970.0366811752,"result":[{"x":"*"},{"final-ts":3761482699.4551954269,"offset":-3274.989631,"origin-ts":3761482694.3486270905,"receive-ts":3761485971.891418457,"rtt":5.106322,"transmit-ts":3761485971.8916659355},{"x":"*"}],"root-delay":0,"root-dispersion":0.00102234,"src_addr":"10.5.50.240","stratum":1,"timestamp":1552493894,"type":"ntp","version":4}
{"af":4,"dst_name":"pool.ntp.org","from":"2.183.50.198","fw":4960,"group_id":2048605,"lts":-1,"msm_id":2048605,"msm_name":"Ntp","prb_id":33714,"proto":"UDP","result":[{"error":"name resolution failed: non-recoverable failure in name resolution (1)"}],"timestamp":1552493962,"ttr":12032.946445,"type":"ntp"}
{"af":4,"dst_addr":"193.0.0.229","dst_name":"193.0.0.229","from":"2.183.50.198","fw":4960,"group_id":2048606,"lts":-1,"msm_id":2048606,"msm_name":"Ntp","prb_id":33714,"proto":"UDP","result":[{"x":"*"},{"x":"*"},{"x":"*"}],"src_addr":"10.5.50.240","timestamp":1552494794,"type":"ntp"}
{"af":4,"dst_name":"pool.ntp.org","from":"2.183.50.198","fw":4960,"group_id":2048605,"lts":-1,"msm_id":2048605,"msm_name":"Ntp","prb_id":33714,"proto":"UDP","result":[{"error":"name resolution failed: non-recoverable failure in name resolution (1)"}],"timestamp":1552494860,"ttr":954.17154,"type":"ntp"}

1 个答案:

答案 0 :(得分:1)

您正在反向使用json.loadsjson.dumpsjson.loads用于将JSON字符串解析为对象,json.dumps用于将对象转换为JSON字符串。

但是您无需执行任何操作,您可以对从文件中读取的字符串进行操作。

with open("ntp-2019-03-13T1600", "r") as get_file, open("formatted_ntp/zzz", "w") as new_file:
    for line in get_file:
        new_file.write(line.replace("-", "TEST"))

请注意,这可能会在新文件中创建无效的JSON。如果原始JSON中有一个-1这样的负数,它将变成TEST1

如果要避免此问题,则需要使用json.loads()解析每一行。然后以递归方式搜索所有字符串形式的值,然后仅对这些值进行替换。然后使用json.dumps()将该对象转换回JSON并将其写入文件,然后再换行。