pandas列表的字典来分隔列

时间:2017-03-26 17:31:58

标签: python python-3.x pandas

我有一个如下数据集:

name    status    number   message
matt    active    12345    [job:  , money: none, wife: none]
james   active    23456    [group: band, wife: yes, money: 10000]
adam    inactive  34567    [job: none, money: none, wife:  , kids: one, group: jail]

如何提取键值对,并将它们转换为一直扩展的数据帧?

预期产出:

name    status   number    job    money    wife    group   kids 
matt    active   12345     none   none     none    none    none
james   active   23456     none   10000    none    band    none
adam    inactive 34567     none   none     none    none    one

邮件包含多种不同的密钥类型。

非常感谢任何帮助。

2 个答案:

答案 0 :(得分:5)

这并不容易。

需要将list的{​​{1}}转换为replacedict是一个或多个空格),然后使用ast

然后可以使用来自\s+的{​​{3}},concat删除列的DataFrame构造函数:

df

编辑:

import ast df.message = df.message.replace([':\s+,','\[', '\]', ':\s+', ',\s+'], ['":"none","', '{"', '"}', '":"', '","'], regex=True) df.message = df.message.apply(ast.literal_eval) df1 = pd.DataFrame(df.pop('message').values.tolist(), index=df.index) print (df1) kids money group job money wife 0 NaN none NaN none NaN none 1 NaN NaN band NaN 10000 yes 2 one NaN jail none none none df = pd.concat([df, df1], axis=1) print (df) name status number kids money group job money wife 0 matt active 12345 NaN none NaN none NaN none 1 james active 23456 NaN NaN band NaN 10000 yes 2 adam inactive 34567 one NaN jail none none none 的另一种解决方案:

yaml

答案 1 :(得分:1)

你把它标记为一个列表但是说它是一个字典所以这应该有用:

pd.concat([data.drop(['message'], axis=1), data['message'].apply(pd.Series)], axis=1)