如何将字典的嵌套列表展平为多行?

时间:2019-07-24 18:26:49

标签: python python-3.x pandas nested

我在熊猫数据框中有一个看起来像这样的列:

col1         list_of_dictionaries
1           [{'id': 1,'tid': 1,'measure': 'time','i_id': 0,'type': 'time','time': '2000-06-19T05:08:11Z'},{'id': 2,'tid': 2,'measure': 'time','i_id': 1,'type': 'time','time': '2000-06-19T05:08:11Z'},{'id': 3,'tid': 3,'measure': 'time','i_id': 2,'type': 'time','time': '2000-06-19T05:08:11Z'},{'id': 4,'tid': 4,'measure': 'time','i_id': 1,'type': 'time','time': '2000-06-19T05:08:11Z','status': {'calendar': 0, 'business': 0}}]

如何在同一数据框中展平字典列表,使其看起来像这样?

col1    id   tid   measure i_id  type    time                 status.calendar     status.business                
1       1    1      time    0     time   2000-06-19T05:08:11Z    0                         0  
1       2    2      time    1     time   2000-06-19T05:08:11Z    0                         0
1       3    3      time    2     time   2000-06-19T05:08:11Z    0                         0
1       4    4      time    1     time   2000-06-19T05:08:11Z    0                         0

我想保留原始数据并在其中扩展,同时每次重复列名都创建更多行。

我在列上尝试了json_normalize,但出现错误:

AttributeError: 'str' object has no attribute 'values'

编辑:

x is a tuple according to spyder:

[
{
'
i
d
'
:

2 个答案:

答案 0 :(得分:1)

您可以取消嵌套纯Python,然后使用json_normalize

ids, x = zip(*[(id_, value) for id_, sub in zip(df['col1'], df.lod.values.tolist())\
                            for value in sub])
ndf = pd.io.json.json_normalize(x)

答案 1 :(得分:1)

这是一种实现方法:

df = pd.DataFrame([{"tt":[{'id': 1,'tid': 1,'measure': 'time','i_id': 0,'type': 'time','time': '2000-06-19T05:08:11Z'},{'id': 2,'tid': 2,'measure': 'time','i_id': 1,'type': 'time','time': '2000-06-19T05:08:11Z'},{'id': 3,'tid': 3,'measure': 'time','i_id': 2,'type': 'time','time': '2000-06-19T05:08:11Z'},{'id': 4,'tid': 4,'measure': 'time','i_id': 1,'type': 'time','time': '2000-06-19T05:08:11Z','status': {'calendar': 0, 'business': 0}}], "col1":0}, {"tt":[{'id': 5,'tid': 1,'measure': 'time','i_id': 0,'type': 'time','time': '2000-06-19T05:08:11Z'},{'id': 6,'tid': 2,'measure': 'time','i_id': 1,'type': 'time','time': '2000-06-19T05:08:11Z'},{'id': 7,'tid': 3,'measure': 'time','i_id': 2,'type': 'time','time': '2000-06-19T05:08:11Z'},{'id': 8,'tid': 4,'measure': 'time','i_id': 1,'type': 'time','time': '2000-06-19T05:08:11Z','status': {'calendar': 0, 'business': 0}}], "col1":1}])

res = df["tt"].values
# Add all the appropriate column values to dicts
for i, elem in enumerate(res):
    for dic in elem:
        dic["col1"]=df.iloc[i]["col1"].copy()

# Concatenate all so no need to append to DataFrame, append is slow
store = []
for x in res:
  store.extend(x)

# Now use normalize to expand and create the Dataframe
df2 = pd.io.json.json_normalize(store)

# Some fluff, if you care
df2.fillna(0, inplace=True)
for col in ["status.business", "status.calendar"]:
    df2[col] = df2[col].astype(int, copy=False)

print(df2)

输出:

   col1  i_id  id measure  status.business  status.calendar  tid                  time  type
0     0     0   1    time                0                0    1  2000-06-19T05:08:11Z  time
1     0     1   2    time                0                0    2  2000-06-19T05:08:11Z  time
2     0     2   3    time                0                0    3  2000-06-19T05:08:11Z  time
3     0     1   4    time                0                0    4  2000-06-19T05:08:11Z  time
4     1     0   5    time                0                0    1  2000-06-19T05:08:11Z  time
5     1     1   6    time                0                0    2  2000-06-19T05:08:11Z  time
6     1     2   7    time                0                0    3  2000-06-19T05:08:11Z  time
7     1     1   8    time                0                0    4  2000-06-19T05:08:11Z  time