我有一个要整理的json文件。如果json文件中只有一条消息,则该功能正常运行,但是,当有多条消息时,会出现以下错误:
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 39 column 1 (char 952)
JSON文件示例
{
"number": "Abc",
"date": "01.10.2016",
"name": "R 3932",
"locations": [
{
"depTimeDiffMin": "0",
"name": "Spital am Pyhrn Bahnhof",
"arrTime": "",
"depTime": "06:32",
"platform": "2",
"stationIdx": "0",
"arrTimeDiffMin": "",
"track": "R 3932"
},
{
"depTimeDiffMin": "0",
"name": "Windischgarsten Bahnhof",
"arrTime": "06:37",
"depTime": "06:40",
"platform": "2",
"stationIdx": "1",
"arrTimeDiffMin": "1",
"track": ""
},
{
"depTimeDiffMin": "",
"name": "Linz/Donau Hbf",
"arrTime": "08:24",
"depTime": "",
"platform": "1A-B",
"stationIdx": "22",
"arrTimeDiffMin": "1",
"track": ""
}
]
}
{
"number": "Xyz",
"date": "01.10.2016",
"name": "R 3932",
"locations": [
{
"depTimeDiffMin": "0",
"name": "Spital am Pyhrn Bahnhof",
"arrTime": "",
"depTime": "06:32",
"platform": "2",
"stationIdx": "0",
"arrTimeDiffMin": "",
"track": "R 3932"
},
{
"depTimeDiffMin": "0",
"name": "Windischgarsten Bahnhof",
"arrTime": "06:37",
"depTime": "06:40",
"platform": "2",
"stationIdx": "1",
"arrTimeDiffMin": "1",
"track": ""
},
{
"depTimeDiffMin": "",
"name": "Linz/Donau Hbf",
"arrTime": "08:24",
"depTime": "",
"platform": "1A-B",
"stationIdx": "22",
"arrTimeDiffMin": "1",
"track": ""
}
]
}
我的代码:
import json
import pandas as pd
import numpy as np
from pandas.io.json import json_normalize
desired_width=500
pd.set_option('display.width', desired_width)
np.set_printoptions(linewidth=desired_width)
pd.set_option('display.max_columns', 100)
with open('C:/Users/username/Desktop/samplejson.json') as f:
data = json.load(f)
def flatten_json(y):
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(y)
return out
for data in data:
flat = flatten_json(data)
new_flat = json_normalize(flat)
dfs = pd.DataFrame(new_flat)
print(dfs.head(2))
我正在尝试解析整个JSON文件并将所有数据加载到数据框中,以便可以开始将其用于分析目的。如果文件中只有一条消息,则代码可以正常工作,并且输出的表非常宽,具有很多列。
如果我在JSON文件中有多条消息,我会收到上面附加的错误。我查看了stackoverflow中的许多解决方案,但它们似乎没有
有没有更简单的方法来读取和展平JSON文件。我尝试使用大熊猫的json_normalize,但它只会展平1级。
答案 0 :(得分:0)
如果文件中只有一条消息,则该文件为有效的 json ;但是,如果有更多消息(放置它们时),则 json 为no有效期更长([JSON]: Introducing JSON)。示例:
>>> json.loads("{}") {} >>> json.loads("{} {}") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "c:\Install\x64\Python\Python\03.06.08\Lib\json\__init__.py", line 354, in loads return _default_decoder.decode(s) File "c:\Install\x64\Python\Python\03.06.08\Lib\json\decoder.py", line 342, in decode raise JSONDecodeError("Extra data", s, end) json.decoder.JSONDecodeError: Extra data: line 1 column 4 (char 3) >>> json.loads("[{}, {}]") [{}, {}]
有关更多详细信息,请选中[Python 3]: json - JSON encoder and decoder
拥有包含多条消息的有效 json 的最简单方法:
就像在“ 位置”子消息中一样。
答案 1 :(得分:0)
您可以这样做。假设j
是完整的json对象。
def parse(j):
for item in j:
data = pd.DataFrame([{k:v for k, v in item.items() if k != 'locations'}])
locs = pd.DataFrame(item.get('locations'))
yield pd.concat([data, locs], axis=1).fillna(method='ffill')
pd.concat(parse(j), axis=0, ignore_index=True)
date name number arrTime ... name platform stationIdx track
0 01.10.2016 R 3932 Abc ... Spital am Pyhrn Bahnhof 2 0 R 3932
1 01.10.2016 R 3932 Abc 06:37 ... Windischgarsten Bahnhof 2 1
2 01.10.2016 R 3932 Abc 08:24 ... Linz/Donau Hbf 1A-B 22
3 01.10.2016 R 3932 Xyz ... Spital am Pyhrn Bahnhof 2 0 R 3932
4 01.10.2016 R 3932 Xyz 06:37 ... Windischgarsten Bahnhof 2 1
5 01.10.2016 R 3932 Xyz 08:24 ... Linz/Donau Hbf 1A-B 22
您的JSON
无效,因为您缺少将两个对象分开的,
。