4级嵌套字典转换为pandas dataframe python

时间:2018-02-03 05:10:21

标签: python pandas dictionary dataframe nested

我有4级嵌套字典,想要转换为pandas数据框或面板数据表单来提取csv。我希望每个细胞都有信息 来自嵌套词典。

我有如下所示的嵌套字典,但在实际数据中有更多的键和值。

{2008: {'Barack Obama': {1: {'Author': 'Barack Obama',
    'City': [],
    'Title': 'Keynote Address at the 2004 Democratic National Convention',
    'Type': 'address',
    'Year': 2008},
   2: {'Author': 'Barack Obama',
    'City': ['Springfield'],
    'Title': 'Remarks Announcing Candidacy for President in Springfield,     Illinois',
    'Type': 'remarks',
'Year': 2008},
   3: {'Author': 'Barack Obama',
'City': ['Chicago'],
'Title': 'Remarks at the AIPAC Policy Forum in Chicago',
'Type': 'remarks',
'Year': 2008}},

 'Bill Richardson': {1: {'Author': 'Bill Richardson',
'City': [],
'Title': 'Iraq Speech to New Hampshire Democratic State Party State Central Committee',
'Type': 'speech',
'Year': 2008},
   2: {'Author': 'Bill Richardson',
'City': [],
'Title': 'Address to the DNC Winter Meeting',
'Type': 'address',
'Year': 2008},
   3: {'Author': 'Bill Richardson',
'City': [],
'Title': 'Speech: The New Realism and the Rebirth of American Leadership',
'Type': 'speech',
'Year': 2008}}},


2012: {'Barack Obama': {1: {'Author': 'Barack Obama',
'City': ['Parma'],
'Title': '535 - Remarks at a Campaign Rally in Parma, Ohio',
'Type': 'remarks',
'Year': '2012'},
   2: {'Author': 'Barack Obama',
'City': ['Sandusky'],
'Title': '534 - Remarks at a Campaign Rally in Sandusky, Ohio',
'Type': 'remarks',
'Year': '2012'},
   3: {'Author': 'Barack Obama',
'City': [],
'Title': '533 - Remarks at a Campaign Rally in Maumee, Ohio',
'Type': 'remarks',
'Year': '2012'}}}

我希望转换为这个数据框。

Year    Author1        No.   Author          City           Title   Type    Year
2008    Barack Oabama   1    Barack Oabama     []           ....    address  2008
2008    Barack Oabama   2    Barack Oabama   ['Springfield'] ....   remarks    2008
2008    Barack Oabama   3    Barack Oabama   ['Chicago']     ....   remarks    2008

 .......................

2008    Bill Richardson   1    Bill Richardson  []          ....   remarks    2008
2008    Bill Richardson   2    Bill Richardson  []          ....   address    2008
2008    Bill Richardson   3    Bill Richardson  []          ....   speech    2008

.............

2012    Barack Oabama   1    Barack Oabama   ['Parma']     ....   remarks    2012
2012    Barack Oabama   2    Barack Oabama   ['Sandusky']     ....   remarks    2012
2012    Barack Oabama   3    Barack Oabama   []               ....   remarks    2012
.....................

我已经阅读了一些使用for循环进入数据框的答案,但是它给出了第一列合并索引,但我确实希望每个单元格都有字典中的信息。有什么建议?谢谢!

我已经尝试过这段代码了,这并没有给出我想要的东西,它给了我第一列的合并索引单元格,它不适用于4级嵌套字典。我为循环修改了一个,但是最终形式有三个元组,这不是我想要的。

pd.DataFrame.from_dict({(i,j): user_dict[i][j] 
                       for i in user_dict.keys() 
                       for j in user_dict[i].keys()},
                   orient='index')

2 个答案:

答案 0 :(得分:2)

首先使用有用的名称构建字典有助于了解正在发生的事情。

temp = {}
for year1, values1 in data.items():
    for author1, values2 in values1.items():
        for number, values3 in values2.items():
            temp.setdefault('Year1', []).append(year1)
            temp.setdefault('Author1', []).append(author1)
            temp.setdefault('No.', []).append(number)
            for key, value in values3.items():
                temp.setdefault(key, []).append(value)
print(pd.DataFrame(temp))

输出:

            Author          Author1           City  No.  \
0     Barack Obama     Barack Obama             []    1   
1     Barack Obama     Barack Obama  [Springfield]    2   
2     Barack Obama     Barack Obama      [Chicago]    3   
3  Bill Richardson  Bill Richardson             []    1   
4  Bill Richardson  Bill Richardson             []    2   
5  Bill Richardson  Bill Richardson             []    3   
6     Barack Obama     Barack Obama        [Parma]    1   
7     Barack Obama     Barack Obama     [Sandusky]    2   
8     Barack Obama     Barack Obama             []    3   



                                               Title     Type  Year  Year1  
0  Keynote Address at the 2004 Democratic Nationa...  address  2008   2008  
1  Remarks Announcing Candidacy for President in ...  remarks  2008   2008  
2       Remarks at the AIPAC Policy Forum in Chicago  remarks  2008   2008  
3  Iraq Speech to New Hampshire Democratic State ...   speech  2008   2008  
4                  Address to the DNC Winter Meeting  address  2008   2008  
5  Speech: The New Realism and the Rebirth of Ame...   speech  2008   2008  
6   535 - Remarks at a Campaign Rally in Parma, Ohio  remarks  2012   2012  
7  534 - Remarks at a Campaign Rally in Sandusky,...  remarks  2012   2012  
8  533 - Remarks at a Campaign Rally in Maumee, Ohio  remarks  2012   2012 

我们使用您想要的列顺序创建:

df = pd.DataFrame(temp, columns=['Year1', 'Author1',  'No.', 'Author',
                                 'City', 'Title', 'Type', 'Year']) 
df

enter image description here

答案 1 :(得分:0)

使用列表理解:

df = pd.DataFrame([[k, j, n] + [p for p in m.values()] for k, i in d.items() for j, l in i.items() for n, m in l.items()],columns=['Year', 'Author1', 'No.', 'Author', 'City', 'Title', 'Type', 'Year'])

#    df
#       Year          Author1  No.           Author           City                                                 Title     Type  Year  
#    0  2008     Barack Obama    1     Barack Obama             []     Keynote Address at the 2004 Democratic Nationa...  address  2008  
#    1  2008     Barack Obama    2     Barack Obama  [Springfield]     Remarks Announcing Candidacy for President in ...  remarks  2008  
#    2  2008     Barack Obama    3     Barack Obama      [Chicago]          Remarks at the AIPAC Policy Forum in Chicago  remarks  2008  
#    3  2008  Bill Richardson    1  Bill Richardson             []     Iraq Speech to New Hampshire Democratic State ...   speech  2008  
#    4  2008  Bill Richardson    2  Bill Richardson             []                     Address to the DNC Winter Meeting  address  2008  
#    5  2008  Bill Richardson    3  Bill Richardson             []     Speech: The New Realism and the Rebirth of Ame...   speech  2008  
#    6  2012     Barack Obama    1     Barack Obama        [Parma]     535 - Remarks at a Campaign Rally in Parma, Ohio   remarks  2012  
#    7  2012     Barack Obama    2     Barack Obama     [Sandusky]     534 - Remarks at a Campaign Rally in Sandusky,...  remarks  2012  
#    8  2012     Barack Obama    3     Barack Obama             []     533 - Remarks at a Campaign Rally in Maumee, Ohio  remarks  2012