我有DataFrame,其中的各列包含字典。
可以如下创建
lis = [
{'id': '1',
'author': {'self': 'A',
'displayName': 'A'},
'created': '2018-12-18',
'items': {'field': 'status',
'fromString': 'Backlog'}},
{'id': '2',
'author': {'self': 'B',
'displayName': 'B'},
'created': '2018-12-18',
'items': {'field': 'status',
'fromString': 'Funnel'}}]
pd.DataFrame(lis)
author created id items
0 {'self': 'A', 'displayName': 'A'} 2018-12-18 1 {'field': 'status', 'fromString': 'Backlog'}
1 {'self': 'B', 'displayName': 'B'} 2018-12-18 2 {'field': 'status', 'fromString': 'Funnel'}
我想转换此信息多级DataFrame。
我一直在尝试
pd.MultiIndex.from_product(lis)
pd.MultiIndex.from_frame(pd.DataFrame(lis))
但是无法获得我想要的结果。基本上我想要如下所示:
author created id items
self displayName field fromString
A A 2018-12-18 1 status Backlog
B B 2018-12-18 2 status Funnel
关于如何实现此目标的任何建议?
谢谢
答案 0 :(得分:3)
您可以使用json.json_normalize
-但列名用.
分隔符展平:
from pandas.io.json import json_normalize
lis = [
{'id': '1',
'author': {'self': 'A',
'displayName': 'A'},
'created': '2018-12-18',
'items': {'field': 'status',
'fromString': 'Backlog'}},
{'id': '2',
'author': {'self': 'B',
'displayName': 'B'},
'created': '2018-12-18',
'items': {'field': 'status',
'fromString': 'Funnel'}}]
df = json_normalize(lis)
print (df)
id created author.self author.displayName items.field items.fromString
0 1 2018-12-18 A A status Backlog
1 2 2018-12-18 B B status Funnel
对于列中和索引中的MulitIndex
,请先由DataFrame.set_index
的所有没有Mulitiindex
的列创建.
,然后使用str.split
:
df = df.set_index(['id','created'])
df.columns = df.columns.str.split('.', expand=True)
print (df)
author items
self displayName field fromString
id created
1 2018-12-18 A A status Backlog
2 2018-12-18 B B status Funnel
如果在列中需要MulitIndex
-可以,但是列名称中缺少值:
df.columns = df.columns.str.split('.', expand=True)
print (df)
id created author items
NaN NaN self displayName field fromString
0 1 2018-12-18 A A status Backlog
1 2 2018-12-18 B B status Funnel
缺少的值应替换为空字符串:
df = df.rename(columns= lambda x: '' if x != x else x)
print (df)
id created author items
self displayName field fromString
0 1 2018-12-18 A A status Backlog
1 2 2018-12-18 B B status Funnel
答案 1 :(得分:1)
尝试以下方法,希望对您有所帮助。
df = pd.io.json.json_normalize(lis)
print(sorted(df.columns))
tupleList = [tuple(values.split(".")) if "." in values else (values,None) for values in sorted(df.columns)]
df.columns=pd.MultiIndex.from_tuples(tuplelist)
print(df)
输出将如下所示
author created id items
displayName self NaN NaN field fromString
A A 2018-12-18 1 status Backlog
B B 2018-12-18 2 status Funnel