Question

所以这是在python中读取JSON文件的标准方法

import json
from pprint import pprint

with open('ig001.json') as data_file:    
    data = json.load(data_file)

pprint(data)

但是，我想要读取的JSON文件中包含多个JSON对象。所以它看起来像：

[{}，{} .... ]

这表示2个JSON对象，并且在每个{}内的每个对象内部，有一堆键：值对。

因此，当我尝试使用上面的标准读取代码读取此内容时，我收到错误：

追踪（最近一次通话）：文件＆＃34; jsonformatter.py＆＃34;，第5行，in data = json.load（data_file）文件＆＃34; /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/ init .py＆＃34;，第290行，加载 ** KW）文件＆＃34; /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/ init .py＆＃34;，第338行，在加载中 return _default_decoder.decode（s）文件＆＃34; /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py" ;,第369行，解码提出ValueError（errmsg（＆＃34;额外数据＆＃34;，s，end，len（s））） ValueError：额外数据：第3889行第2行 - 第719307行第2列（字符164691 - 30776399）

第3889行是第一个JSON对象结束而下一个JSON对象开始的地方，该行本身看起来像＆＃34;] [＆＃34;。

任何关于如何解决这个问题的想法都将不胜感激，谢谢！

Answer 1

如果没有链接您的JSON文件，我将不得不做出一些假设：

顶层json数组不是各自独立的（因为第一个解析错误在3889行），所以我们不能轻易
这是文件中唯一存在的无效JSON类型。

解决此问题：

# 1. replace instances of `][` with `]<SPLIT>[`
# (`<SPLIT>` needs to be something that is not present anywhere in the file to begin with)

raw_data = data_file.read()  # we're going to need the entire file in memory
tweaked_data = raw_data.replace('][', ']<SPLIT>[')

# 2. split the string into an array of strings, using the chosen split indicator

split_data = tweaked_data.split('<SPLIT>')

# 3. load each string individually

parsed_data = [json.loads(bit_of_data) for bit_of_data in split_data]

（原谅可怕的变量名称）

如何在Python中读取包含多个JSON对象的JSON文件？

1 个答案: