从列表要塞中的嵌套字典中获取键和值到数据框

时间:2019-10-11 16:43:03

标签: python json python-3.x pandas dictionary

我有很多嵌套的字典列表。我正在尝试从特定的嵌套字典中捕获“键”并将其转换为数据框。我该怎么做呢?我有基本的字典知识来生成密钥,我尝试附加[]{},但效果并不理想。任何指导表示赞赏!

import pandas as pd
from pprint import pprint

d = {'Main':{
            'SecondLevel':
                    [{'Identifier':'abc',
                     'StudentInfo':{'Name':'Mike','Grade':'1',
                                    'TeachersAssigned':[{'Name':'Paul'},
                                                        {'Name':'Smith'}
                                                       ]}},
                    {
                     'StudentInfo':{'Name':'Mandy','Grade':'1',
                                    'TeachersAssigned':[{'Name':'Baker'},
                                                        {'Name':'Smith'}
                                                       ]}}]}}
pprint(d)

list_dict = []
for doc in d['Main']['SecondLevel']:
    identifier = '' if doc.get('Identifier') is None else doc['Identifier']
    studentname = doc['StudentInfo']['Name']

    list_dict.append(identifier)
    list_dict.append(studentname)

    for teach in doc['StudentInfo']['TeachersAssigned']:
        teachers_name = teach['Name']

        list_dict.append(teachers_name)

pprint(list_dict)

>>> ['abc', 'Mike', 'Paul', 'Smith', '', 'Mandy', 'Baker', 'Smith']

pd.DataFrame(list_dict)
>>> single column with list of the values from above

我正试图让它像这样:

Identifier   StudentInfo    TeachersAssigned
abc          Mike           Paul
abc          Mike           Smith
             Mandy          Baker
             Mandy          Smith

我对列表理解是否为for循环做错了?

1 个答案:

答案 0 :(得分:1)

鉴于您的字典,这就是我的管理方式。但是,正如我之前所解释的,DataFrame中不能包含不同长度的列,因此可以使用np.nan

import pandas as pd
import numpy as np
import pandas as pd
d = {'Main':{
            'SecondLevel':
                    [{'Identifier':'abc',
                     'StudentInfo':{'Name':'Mike','Grade':'1',
                                    'TeachersAssigned':[{'Name':'Paul'},
                                                        {'Name':'Smith'}
                                                       ]}},
                    {
                     'StudentInfo':{'Name':'Mandy','Grade':'1',
                                    'TeachersAssigned':[{'Name':'Baker'},
                                                        {'Name':'Smith'}
                                                       ]}}]}}
data = {'Identifier':[],'Name':[],'TeachersAssigned':[]}
for i in range(len(d['Main']['SecondLevel'])):
    for j in range(len(d['Main']['SecondLevel'][i]['StudentInfo']['TeachersAssigned'])):
        try: 
            data['Identifier'].append(d['Main']['SecondLevel'][i]['Identifier'])
        except KeyError:
            data['Identifier'].append(np.nan)
        data['Name'].append(d['Main']['SecondLevel'][i]['StudentInfo']['Name'])
        data['TeachersAssigned'].append(d['Main']['SecondLevel'][i]['StudentInfo']['TeachersAssigned'][j]['Name'])
df = pd.DataFrame(data)
print(df)

输出:

Identifier   Name TeachersAssigned
0        abc   Mike             Paul
1        abc   Mike            Smith
2        NaN  Mandy            Baker
3        NaN  Mandy            Smith