我有一个JSON文件
Ubuntu
我尝试了response ={
"classifier_id": "xxxxx-xx-1",
"url": "/testers/xxxxx-xx-1",
"collection": [
{
"text": "How hot will it be today?",
"top_class": "temperature",
"classes": [
{
"class_name": "temperature",
"confidence": 0.993
},
{
"class_name": "conditions",
"confidence": 0.006
}
]
},
{
"text": "Is it hot outside?",
"top_class": "temperature",
"classes": [
{
"class_name": "temperature",
"confidence": 1.0
},
{
"class_name": "conditions",
"confidence": 0.0
}
]
}
]
}
,但是它给出了重复项。
如何将这个Jason文件转换为Pandas DataFrame?
每个集合的记录应扩展得很宽,而不是长。
答案 0 :(得分:0)
如果json_normalize()不适用于您的json结构,则可以使用自定义逻辑对其进行解析。这是一个示例:
# define dictionary with desired structure
d = {
'text': [],
'top_class': [],
'temperature': [],
'confidence': []
}
# load json
data = json.loads(response)
# iterate over collection and extract elements needed
for el in data['collection']:
d['text'].append(el['text'])
d['top_class'].append(el['top_class'])
d['temperature'].append([e['confidence'] for e in el['classes'] if e['class_name'] == 'temperature'][0])
d['confidence'].append([e['confidence'] for e in el['classes'] if e['class_name'] == 'conditions'][0])
df = pd.DataFrame(d)
df.head()
输出:
答案 1 :(得分:0)
flatten_json
,'collection'
记录可以扩展为一个宽幅数据框。pandas.DataFrame.rename
重命名列标题。df = pd.DataFrame([flatten_json(x) for x in response['collection']])
# display(df)
text top_class classes_0_class_name classes_0_confidence classes_1_class_name classes_1_confidence
0 How hot will it be today? temperature temperature 0.993 conditions 0.006
1 Is it hot outside? temperature temperature 1.000 conditions 0.000