我现在已将当前问题添加到GitHib中。请找到repo的URL。我已经加入了一个Jupyter笔记本,也解释了这个问题。谢谢你们。
https://github.com/simongraham/dataExplore.git
我目前正在处理项目的营养数据,其中数据采用原始JSON格式,我想使用python和pandas来获取可理解的数据框。我知道当JSON没有嵌套时,这是一项简单的任务。我会在这里使用:
nutrition = pd.read_json('data')
然而,我有嵌套信息,我发现很难将其纳入合理的数据框架。 JSON格式如下,其中nutrition营养元素本身是嵌套元素。这个元素的巢将描述各种不同的东西的营养成分,如酒精和bcfa,包括在内。我只包含一个示例,因为这是一个大型数据文件。
[
{
"vcNutritionPortionId": "478d1905-f264-4d9b-ab76-0ed4252193fd",
"vcNutritionId": "2476378b-79ee-4857-a81d-489661a039a1",
"vcUserId": "cc51145b-5a70-4344-9b55-1a4455f0a9d2",
"vcPortionId": "1",
"vcPortionName": "1 average pepper",
"vcPortionSize": "20",
"ftEnergyKcal": 5.2,
"vcPortionUnit": "g",
"dtConsumedDate": "2016-05-04T00:00:00",
"nutritionNutrients": [
{
"vcNutritionPortionId": "478d1905-f264-4d9b-ab76-0ed4252193fd",
"vcNutrient": "alcohol",
"ftValue": 0,
"vcUnit": "g",
"nPercentRI": 0,
"vcTrafficLight": ""
},
{
"vcNutritionPortionId": "478d1905-f264-4d9b-ab76-0ed4252193fd",
"vcNutrient": "bcfa",
"ftValue": 0,
"vcUnit": "g",
"nPercentRI": 0,
"vcTrafficLight": ""
},
{
"vcNutritionPortionId": "478d1905-f264-4d9b-ab76-0ed4252193fd",
"vcNutrient": "biotin",
"ftValue": 0,
"vcUnit": "µg",
"nPercentRI": 0,
"vcTrafficLight": ""
},
...
]
}
]
任何帮助都将不胜感激。
感谢。
.... ....
现在我已经找到了如何使用json_normalize解决这个问题,我返回了同样的问题,但这次我的代码嵌套了两次。即:
[
{
...
}
[,
"nutritionPortions": [
{
"vcNutritionPortionId": "478d1905-f264-4d9b-ab76-0ed4252193fd",
"vcNutritionId": "2476378b-79ee-4857-a81d-489661a039a1",
"vcUserId": "cc51145b-5a70-4344-9b55-1a4455f0a9d2",
"vcPortionId": "1",
"vcPortionName": "1 average pepper",
"vcPortionSize": "20",
"ftEnergyKcal": 5.2,
"vcPortionUnit": "g",
"dtConsumedDate": "2016-05-04T00:00:00",
"nutritionNutrients": [
{
"vcNutritionPortionId": "478d1905-f264-4d9b-ab76-0ed4252193fd",
"vcNutrient": "alcohol",
"ftValue": 0,
"vcUnit": "g",
"nPercentRI": 0,
"vcTrafficLight": ""
},
{
"vcNutritionPortionId": "478d1905-f264-4d9b-ab76-0ed4252193fd",
"vcNutrient": "bcfa",
"ftValue": 0,
"vcUnit": "g",
"nPercentRI": 0,
"vcTrafficLight": ""
},
{
"vcNutritionPortionId": "478d1905-f264-4d9b-ab76-0ed4252193fd",
"vcNutrient": "biotin",
"ftValue": 0,
"vcUnit": "µg",
"nPercentRI": 0,
"vcTrafficLight": ""
},
...
}
]
}
]
当我有一个只包含营养数据的JSON时,我可以使用:
nutrition = (pd.io
.json
.json_normalize((data, ['nutritionPortions']), 'nutritionNutrients',
['vcNutritionId','vcUserId','vcPortionId','vcPortionName','vcPortionSize',
'ftEnergyKcal','vcPortionUnit','dtConsumedDate'])
)
但是,我的数据不仅包含营养信息。例如,它将包含活动信息,因此营养信息在开始时与“nutrtitionPortions”嵌套。让我们假设所有其他列都没有嵌套,它们由“Activity”和“Wellbeing”表示。
如果我使用代码:
nutrition = (pd.io
.json
.json_normalize(data, ['nutritionPortions'])
)
我将回到“营养素”嵌套的原始问题,但是我没有成功,然后获得相应的数据框。
谢谢
答案 0 :(得分:4)
更新:这适用于您的kaidoData.json
文件:
df = (pd.io
.json
.json_normalize(data[0]['ionPortions'], 'nutritionNutrients',
['vcNutritionId','vcUserId','vcPortionId','vcPortionName','vcPortionSize',
'dtCreatedDate','dtUpdatedDate','nProcessingStatus',
'vcPortionUnit','dtConsumedDate'
]
)
)
PS我不知道'ftEnergyKcal'有什么问题 - 它让我失望:
KeyError:'ftEnergyKcal'
可能在某些部分缺少
OLD回答:
(pd.io
.json
.json_normalize(l, 'nutritionNutrients',
['vcNutritionId','vcUserId','vcPortionId','vcPortionName','vcPortionSize',
'ftEnergyKcal','vcPortionUnit','dtConsumedDate'])
)
演示:
In [107]: (pd.io
.....: .json
.....: .json_normalize(l, 'nutritionNutrients',
.....: ['vcNutritionId','vcUserId','vcPortionId','vcPortionName','vcPortionSize',
.....: 'ftEnergyKcal','vcPortionUnit','dtConsumedDate'])
.....: )
Out[107]:
ftValue nPercentRI vcNutrient vcNutritionPortionId vcTrafficLight ... vcPortionSize \
0 0 0 alcohol 478d1905-f264-4d... ... 20
1 0 0 bcfa 478d1905-f264-4d... ... 20
2 0 0 biotin 478d1905-f264-4d... ... 20
vcNutritionId vcPortionId ftEnergyKcal vcPortionName
0 2476378b-79ee-48... 1 5.2 1 average pepper
1 2476378b-79ee-48... 1 5.2 1 average pepper
2 2476378b-79ee-48... 1 5.2 1 average pepper
[3 rows x 14 columns]
其中l
是您的列表(解析为JSON)