如何在大熊猫中将JSON展平为多种格式

时间:2020-09-19 09:19:38

标签: python json pandas dataframe normalize

我有一个JSON文件

Ubuntu

电流输出

enter image description here

代码和不希望的输出

enter image description here

我尝试了response ={ "classifier_id": "xxxxx-xx-1", "url": "/testers/xxxxx-xx-1", "collection": [ { "text": "How hot will it be today?", "top_class": "temperature", "classes": [ { "class_name": "temperature", "confidence": 0.993 }, { "class_name": "conditions", "confidence": 0.006 } ] }, { "text": "Is it hot outside?", "top_class": "temperature", "classes": [ { "class_name": "temperature", "confidence": 1.0 }, { "class_name": "conditions", "confidence": 0.0 } ] } ] } ,但是它给出了重复项。

如何将这个Jason文件转换为Pandas DataFrame?

每个集合的记录应扩展得很宽,而不是长。

result: DataFrameImage

2 个答案:

答案 0 :(得分:0)

如果json_normalize()不适用于您的json结构,则可以使用自定义逻辑对其进行解析。这是一个示例:

# define dictionary with desired structure
d = {
     'text': [],
     'top_class': [],
     'temperature': [],
     'confidence': [] 
}

# load json
data = json.loads(response)

# iterate over collection and extract elements needed
for el in data['collection']:
    d['text'].append(el['text'])
    d['top_class'].append(el['top_class'])
    d['temperature'].append([e['confidence'] for e in el['classes'] if e['class_name'] == 'temperature'][0])
    d['confidence'].append([e['confidence'] for e in el['classes'] if e['class_name'] == 'conditions'][0])
    
df = pd.DataFrame(d)

df.head()

输出:

Output

答案 1 :(得分:0)

df = pd.DataFrame([flatten_json(x) for x in response['collection']])

# display(df)
                        text    top_class classes_0_class_name  classes_0_confidence classes_1_class_name  classes_1_confidence
0  How hot will it be today?  temperature          temperature                 0.993           conditions                 0.006
1         Is it hot outside?  temperature          temperature                 1.000           conditions                 0.000