我从未有过解析JSON文件的经验,直到上周我获得此任务:使用一些Python脚本读取23 MB JSON文件并将一些特定数据存储到CSV。我最近几天一直在搜索如何解析它,看到不同的实现如何用Python做到这一点,但在我的情况下没有任何作用。文件中有一个JSON对象的示例:
{
"created": "2017-01-19T04:39:41.012",
"expired": "2017-01-21T04:39:41.012",
"id": "0000e0be-d2c6-4a89-ad37-8f71d0dd9e9a",
"mixed": false,
"pool_id": "189591",
"reward": 0.5,
"status": "EXPIRED",
"task_suite_id": "f1aa98d6-ff25-4dde-81f5-2587ccbe36af",
"tasks": [
{
"id": "ffbc4048-cc5a-4578-b0d9-0705a588b55d",
"input_values": {
"address-ru": "\u0420\u043e\u0441\u0441\u0438\u044f, \u0421\u0432\u0435\u0440\u0434\u043b\u043e\u0432\u0441\u043a\u0430\u044f \u043e\u0431\u043b\u0430\u0441\u0442\u044c, \u041f\u0435\u0440\u0432\u043e\u0443\u0440\u0430\u043b\u044c\u0441\u043a, 1-\u044f \u041f\u0438\u043b\u044c\u043d\u0430\u044f \u0443\u043b\u0438\u0446\u0430",
"company-id": "1542916387",
"coordinates": "56.91969408920,60.03087172680",
"country": "RU",
"language": "RU",
"name-ru": "\u0421\u0443\u043f\u0435\u0440\u043c\u0430\u0440\u043a\u0435\u0442",
"org-weight": "30",
"rubric": [
{
"name-ru": "\u0421\u0443\u043f\u0435\u0440\u043c\u0430\u0440\u043a\u0435\u0442",
"rubric-id": 184108079
}
]
}
}
],
"user_id": "165684b434e6390fb8da262978601397"
},
{
"created": "2017-02-24T16:08:10.280",
"expired": "2017-02-26T16:08:10.280",
"id": "0001b81e-dbcc-4de3-985d-4397b97dbffa",
"mixed": false,
"pool_id": "189591",
"reward": 0.5,
"status": "EXPIRED",
"task_suite_id": "5dcbbd70-e570-4026-8246-a30bb462f35d",
"tasks": [
{
"id": "90437e00-d15c-4679-b7be-6d3660efdbce",
"input_values": {
"address-ru": "\u041c\u043e\u0441\u043a\u043e\u0432\u0441\u043a\u0430\u044f \u043e\u0431\u043b., \u041a\u043e\u0440\u043e\u043b\u0435\u0432, \u043c\u0438\u043a\u0440\u043e\u0440\u0430\u0439\u043e\u043d \u0412\u0430\u043b\u0435\u043d\u0442\u0438\u043d\u043e\u0432\u043a\u0430, \u0443\u043b. \u0413\u043e\u0440\u044c\u043a\u043e\u0433\u043e, 12, \u043a\u043e\u0440\u043f.\u0412",
"company-id": "662316782",
"coordinates": "55.915326,37.869891",
"country": "RU",
"language": "RU",
"meta": [
{
"permlink-id": 1119957838
}
],
"name-ru": "\u041d\u0435\u0430\u0442\u044d\u043b",
"org-weight": "30",
"rubric": [
{
"name-ru": "\u0420\u0435\u043c\u043e\u043d\u0442 \u0438\u0437\u043c\u0435\u0440\u0438\u0442\u0435\u043b\u044c\u043d\u044b\u0445 \u043f\u0440\u0438\u0431\u043e\u0440\u043e\u0432",
"rubric-id": 184106846
},
{
"name-ru": "\u0412\u043e\u0434\u043e\u0441\u0447\u0435\u0442\u0447\u0438\u043a\u0438, \u0433\u0430\u0437\u043e\u0441\u0447\u0435\u0442\u0447\u0438\u043a\u0438, \u0442\u0435\u043f\u043b\u043e\u0441\u0447\u0435\u0442\u0447\u0438\u043a\u0438",
"rubric-id": 184106834
},
{
"name-ru": "\u041e\u0442\u043e\u043f\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0435 \u043e\u0431\u043e\u0440\u0443\u0434\u043e\u0432\u0430\u043d\u0438\u0435 \u0438 \u0441\u0438\u0441\u0442\u0435\u043c\u044b",
"rubric-id": 184107475
}
]
}
}
],
"user_id": "0ba1f0e613c9b1db5fcbddd342e44a15"
},
......依此类推数十万行。
如果我手动删除JSON对象之间的空格和逗号,这段代码(我在Stackoverflow上找到)似乎可以工作:
import json
json_objects = []
def stream_read_json(file):
start_pos = 0
while True:
try:
obj = json.load(file)
yield obj
return
except json.JSONDecodeError as e:
file.seek(start_pos)
json_str = file.read(e.pos)
obj = json.loads(json_str)
start_pos += e.pos
yield obj
with open('task1.json', 'r') as source:
objCount = 0
for data in stream_read_json(source):
json_objects.append(data)
objCount += 1
print('Added ' + str(objCount) + 'th json object.')
但我无法在任何地方找到如何在阅读JSON文件时摆脱这些空格和逗号。更令人沮丧的是,我找不到任何教程或手册如何使用Python编写JSON解析器以用于不同的情况,以便能够自己完成而不会打扰Stackoverflow。
任何提示和想法都将非常感激。提前谢谢。