我的JSON文件中包含以下数据:
{
"first": {
"name": "James",
"age": 30
},
"second": {
"name": "Max",
"age": 30
},
"third": {
"name": "Norah",
"age": 30
},
"fourth": {
"name": "Sam",
"age": 30
}
}
我想按如下方式打印顶级键和对象:
import json
import ijson
fname = "data.json"
with open(fname) as f:
raw_data = f.read()
data = json.loads(raw_data)
for k in data.keys():
print k, data[k]
输出:
second {u'age': 30, u'name': u'Max'}
fourth {u'age': 30, u'name': u'Sam'}
third {u'age': 30, u'name': u'Norah'}
first {u'age': 30, u'name': u'James'}
所以,非常好。但是如果我想对一个巨大的文件做同样的事情,我将不得不在内存中读取它。这非常慢,需要大量内存。
我想使用增量JSON解析器(在本例中为ijson
)来实现我之前描述的内容:
上述代码取自:No access to top level elements with ijson?
with open(fname) as f:
json_obj = ijson.items(f,'').next() # '' loads everything as only one object.
for (key, value) in json_obj.items():
print key + " -> " + str(value)
这也不合适,因为它还会读取内存中的整个文件。这不是真正的增量。
如何在Python中对JSON文件的顶级键和相应对象进行增量解析?
答案 0 :(得分:0)
由于json文件基本上是文本文件,因此请考虑将顶层剥离为字符串。基本上,使用read file iterable方法将字符串与每一行连接起来,然后在字符串包含指示顶层结尾的双括号}}
时断开循环。当然,双支撑条件必须去掉空格和换行符。
toplevelstring = ''
with open('data.json') as f:
for line in f:
if not '}}' in toplevelstring.replace('\n', '').replace('\s+',''):
toplevelstring = toplevelstring + line
else:
break
data = json.loads(toplevelstring)
现在,如果您的较大的json包裹在方括号或其他大括号中,仍然在常规上运行,但添加以下行以切出第一个字符[
,并在最高级别&之后切换逗号和换行符的最后两个字符#39; s最后一个大括号:
[{
"first": {
"name": "James",
"age": 30
},
"second": {
"name": "Max",
"age": 30
},
"third": {
"name": "Norah",
"age": 30
},
"fourth": {
"name": "Sam",
"age": 30
}
},
{
"data1": {
"id": "AAA",
"type": 55
},
"data2": {
"id": "BBB",
"type": 1601
},
"data3": {
"id": "CCC",
"type": 817
}
}]
...
toplevelstring = toplevelstring[1:-2]
data = json.loads(toplevelstring)
答案 1 :(得分:0)
来自github issue [文件名已更改]
的回答import ijson from ijson.common import ObjectBuilder def objects(file): key = '-' for prefix, event, value in ijson.parse(file): if prefix == '' and event == 'map_key': # found new object at the root key = value # mark the key value builder = ObjectBuilder() elif prefix.startswith(key): # while at this key, build the object builder.event(event, value) if event == 'end_map': # found the end of an object at the current key, yield yield key, builder.value for key, value in objects(open('data.json', 'rb')): print(key, value)
答案 2 :(得分:0)
自 2.6 版以来,ijson 附带了一个 kvitems
函数,可以实现这一点。