如何将嵌套字典转换为熊猫数据框?

时间:2020-10-03 19:39:28

标签: python-3.x pandas dataframe

我正在尝试转换一个具有其他数据框内部的数据框,例如:

{
  'id': 3241234,
  'data': {
           'name':'carol',
           'lastname': 'netflik',
           'office': {
                       'num': 3543,
                       'department': 'trigy'
                    }
        }


}

我尝试使用:

pd.DataFrame.from_dict(data)

但是结果数据框看起来像:

               id                                  data
lastname  3241234                               netflik
name      3241234                                 carol
office    3241234  {'num': 3543, 'department': 'trigy'}

有什么主意吗?

1 个答案:

答案 0 :(得分:2)

加载JSON /字典:

  • 使用.json_normalized展开dict
import pandas as pd

data = {'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}

df = pd.json_normalize(data)

# display(df)
        id data.name data.lastname  data.office.num data.office.department
0  3241234     carol       netflik             3543                  trigy

如果数据框的列为dicts

# dataframe with column of dicts
df = pd.DataFrame({'col2': [1, 2, 3], 'col': [data, data, data]})

# display(df)
   col2                                                                                                                col
0     1  {'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}
1     2  {'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}
2     3  {'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}

# normalize the column of dicts
normalized = pd.json_normalize(df['col'])

# join the normalized column to df
df = df.join(normalized).drop(columns=['col'])

# display(df)
   col2       id data.name data.lastname  data.office.num data.office.department
0     1  3241234     carol       netflik             3543                  trigy
1     2  3241234     carol       netflik             3543                  trigy
2     3  3241234     carol       netflik             3543                  trigy

如果数据框的lists列中有dicts

  • 需要使用dictslists中删除.explode
data = [{'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}]

df = pd.DataFrame({'col2': [1, 2, 3], 'col': [data, data, data]})

# display(df)
   col2                                                                                                                  col
0     1  [{'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}]
1     2  [{'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}]
2     3  [{'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}]

# explode the lists
df = df.explode('col').reset_index(drop=True)

# normalize the column of dicts
normalized = pd.json_normalize(df['col'])

# join the normalized column to df
df = df.join(normalized).drop(columns=['col'])