我有4级嵌套字典,想要转换为pandas数据框或面板数据表单来提取csv。我希望每个细胞都有信息 来自嵌套词典。
我有如下所示的嵌套字典,但在实际数据中有更多的键和值。
{2008: {'Barack Obama': {1: {'Author': 'Barack Obama',
'City': [],
'Title': 'Keynote Address at the 2004 Democratic National Convention',
'Type': 'address',
'Year': 2008},
2: {'Author': 'Barack Obama',
'City': ['Springfield'],
'Title': 'Remarks Announcing Candidacy for President in Springfield, Illinois',
'Type': 'remarks',
'Year': 2008},
3: {'Author': 'Barack Obama',
'City': ['Chicago'],
'Title': 'Remarks at the AIPAC Policy Forum in Chicago',
'Type': 'remarks',
'Year': 2008}},
'Bill Richardson': {1: {'Author': 'Bill Richardson',
'City': [],
'Title': 'Iraq Speech to New Hampshire Democratic State Party State Central Committee',
'Type': 'speech',
'Year': 2008},
2: {'Author': 'Bill Richardson',
'City': [],
'Title': 'Address to the DNC Winter Meeting',
'Type': 'address',
'Year': 2008},
3: {'Author': 'Bill Richardson',
'City': [],
'Title': 'Speech: The New Realism and the Rebirth of American Leadership',
'Type': 'speech',
'Year': 2008}}},
2012: {'Barack Obama': {1: {'Author': 'Barack Obama',
'City': ['Parma'],
'Title': '535 - Remarks at a Campaign Rally in Parma, Ohio',
'Type': 'remarks',
'Year': '2012'},
2: {'Author': 'Barack Obama',
'City': ['Sandusky'],
'Title': '534 - Remarks at a Campaign Rally in Sandusky, Ohio',
'Type': 'remarks',
'Year': '2012'},
3: {'Author': 'Barack Obama',
'City': [],
'Title': '533 - Remarks at a Campaign Rally in Maumee, Ohio',
'Type': 'remarks',
'Year': '2012'}}}
我希望转换为这个数据框。
Year Author1 No. Author City Title Type Year
2008 Barack Oabama 1 Barack Oabama [] .... address 2008
2008 Barack Oabama 2 Barack Oabama ['Springfield'] .... remarks 2008
2008 Barack Oabama 3 Barack Oabama ['Chicago'] .... remarks 2008
.......................
2008 Bill Richardson 1 Bill Richardson [] .... remarks 2008
2008 Bill Richardson 2 Bill Richardson [] .... address 2008
2008 Bill Richardson 3 Bill Richardson [] .... speech 2008
.............
2012 Barack Oabama 1 Barack Oabama ['Parma'] .... remarks 2012
2012 Barack Oabama 2 Barack Oabama ['Sandusky'] .... remarks 2012
2012 Barack Oabama 3 Barack Oabama [] .... remarks 2012
.....................
我已经阅读了一些使用for循环进入数据框的答案,但是它给出了第一列合并索引,但我确实希望每个单元格都有字典中的信息。有什么建议?谢谢!
我已经尝试过这段代码了,这并没有给出我想要的东西,它给了我第一列的合并索引单元格,它不适用于4级嵌套字典。我为循环修改了一个,但是最终形式有三个元组,这不是我想要的。
pd.DataFrame.from_dict({(i,j): user_dict[i][j]
for i in user_dict.keys()
for j in user_dict[i].keys()},
orient='index')
答案 0 :(得分:2)
首先使用有用的名称构建字典有助于了解正在发生的事情。
temp = {}
for year1, values1 in data.items():
for author1, values2 in values1.items():
for number, values3 in values2.items():
temp.setdefault('Year1', []).append(year1)
temp.setdefault('Author1', []).append(author1)
temp.setdefault('No.', []).append(number)
for key, value in values3.items():
temp.setdefault(key, []).append(value)
print(pd.DataFrame(temp))
输出:
Author Author1 City No. \
0 Barack Obama Barack Obama [] 1
1 Barack Obama Barack Obama [Springfield] 2
2 Barack Obama Barack Obama [Chicago] 3
3 Bill Richardson Bill Richardson [] 1
4 Bill Richardson Bill Richardson [] 2
5 Bill Richardson Bill Richardson [] 3
6 Barack Obama Barack Obama [Parma] 1
7 Barack Obama Barack Obama [Sandusky] 2
8 Barack Obama Barack Obama [] 3
Title Type Year Year1
0 Keynote Address at the 2004 Democratic Nationa... address 2008 2008
1 Remarks Announcing Candidacy for President in ... remarks 2008 2008
2 Remarks at the AIPAC Policy Forum in Chicago remarks 2008 2008
3 Iraq Speech to New Hampshire Democratic State ... speech 2008 2008
4 Address to the DNC Winter Meeting address 2008 2008
5 Speech: The New Realism and the Rebirth of Ame... speech 2008 2008
6 535 - Remarks at a Campaign Rally in Parma, Ohio remarks 2012 2012
7 534 - Remarks at a Campaign Rally in Sandusky,... remarks 2012 2012
8 533 - Remarks at a Campaign Rally in Maumee, Ohio remarks 2012 2012
我们使用您想要的列顺序创建:
df = pd.DataFrame(temp, columns=['Year1', 'Author1', 'No.', 'Author',
'City', 'Title', 'Type', 'Year'])
df
答案 1 :(得分:0)
使用列表理解:
df = pd.DataFrame([[k, j, n] + [p for p in m.values()] for k, i in d.items() for j, l in i.items() for n, m in l.items()],columns=['Year', 'Author1', 'No.', 'Author', 'City', 'Title', 'Type', 'Year'])
# df
# Year Author1 No. Author City Title Type Year
# 0 2008 Barack Obama 1 Barack Obama [] Keynote Address at the 2004 Democratic Nationa... address 2008
# 1 2008 Barack Obama 2 Barack Obama [Springfield] Remarks Announcing Candidacy for President in ... remarks 2008
# 2 2008 Barack Obama 3 Barack Obama [Chicago] Remarks at the AIPAC Policy Forum in Chicago remarks 2008
# 3 2008 Bill Richardson 1 Bill Richardson [] Iraq Speech to New Hampshire Democratic State ... speech 2008
# 4 2008 Bill Richardson 2 Bill Richardson [] Address to the DNC Winter Meeting address 2008
# 5 2008 Bill Richardson 3 Bill Richardson [] Speech: The New Realism and the Rebirth of Ame... speech 2008
# 6 2012 Barack Obama 1 Barack Obama [Parma] 535 - Remarks at a Campaign Rally in Parma, Ohio remarks 2012
# 7 2012 Barack Obama 2 Barack Obama [Sandusky] 534 - Remarks at a Campaign Rally in Sandusky,... remarks 2012
# 8 2012 Barack Obama 3 Barack Obama [] 533 - Remarks at a Campaign Rally in Maumee, Ohio remarks 2012