如何将我的JSON数据放入合理的数据框?我有一个深度嵌套的文件,我的目标是进入一个大型数据框。下面的Github存储库中的所有内容都是described:
答案 0 :(得分:1)
使用嵌套的jsons,您需要遍历各个级别,提取所需的段。对于较大json的营养部分,考虑迭代每个nutritionPortions
级别,每次运行pandas规范化并连接到最终数据帧:
import pandas as pd
import json
with open('/Users/simongraham/Desktop/Kaido/Data/kaidoData.json') as f:
data = json.load(f)
# INITIALIZE DF
nutrition = pd.DataFrame()
# ITERATIVELY CONCATENATE
for item in data[0]["nutritionPortions"]:
if 'ftEnergyKcal' in item.keys(): # MISSING IN 3 OF 53 LEVELS
temp = (pd.io
.json
.json_normalize(item, 'nutritionNutrients',
['vcNutritionId','vcUserId','vcPortionId','vcPortionName','vcPortionSize',
'ftEnergyKcal', 'vcPortionUnit','dtConsumedDate'])
)
nutrition = pd.concat([nutrition, temp])
nutrition.head()
输出
ftValue nPercentRI vcNutrient vcNutritionPortionId \
0 0.00 0.0 alcohol c993ac30-ecb4-4154-a2ea-d51dbb293f66
1 0.00 0.0 bcfa c993ac30-ecb4-4154-a2ea-d51dbb293f66
2 7.80 6.0 biotin c993ac30-ecb4-4154-a2ea-d51dbb293f66
3 49.40 2.0 calcium c993ac30-ecb4-4154-a2ea-d51dbb293f66
4 1.82 0.0 carbohydrate c993ac30-ecb4-4154-a2ea-d51dbb293f66
vcTrafficLight vcUnit dtConsumedDate \
0 g 2016-04-12T00:00:00
1 g 2016-04-12T00:00:00
2 µg 2016-04-12T00:00:00
3 mg 2016-04-12T00:00:00
4 g 2016-04-12T00:00:00
vcNutritionId ftEnergyKcal \
0 070b97a4-d562-427d-94a8-1de1481df5d1 18.2
1 070b97a4-d562-427d-94a8-1de1481df5d1 18.2
2 070b97a4-d562-427d-94a8-1de1481df5d1 18.2
3 070b97a4-d562-427d-94a8-1de1481df5d1 18.2
4 070b97a4-d562-427d-94a8-1de1481df5d1 18.2
vcUserId vcPortionName vcPortionSize \
0 fe585e3d-2863-46fe-a41f-290bf58ad169 1 mug 260
1 fe585e3d-2863-46fe-a41f-290bf58ad169 1 mug 260
2 fe585e3d-2863-46fe-a41f-290bf58ad169 1 mug 260
3 fe585e3d-2863-46fe-a41f-290bf58ad169 1 mug 260
4 fe585e3d-2863-46fe-a41f-290bf58ad169 1 mug 260
vcPortionId vcPortionUnit
0 2 ml
1 2 ml
2 2 ml
3 2 ml
4 2 ml