我有一个如下数据集:
name status number message
matt active 12345 [job: , money: none, wife: none]
james active 23456 [group: band, wife: yes, money: 10000]
adam inactive 34567 [job: none, money: none, wife: , kids: one, group: jail]
如何提取键值对,并将它们转换为一直扩展的数据帧?
预期产出:
name status number job money wife group kids
matt active 12345 none none none none none
james active 23456 none 10000 none band none
adam inactive 34567 none none none none one
邮件包含多种不同的密钥类型。
非常感谢任何帮助。
答案 0 :(得分:5)
这并不容易。
需要将list
的{{1}}转换为replace
(dict
是一个或多个空格),然后使用ast
。
然后可以使用来自\s+
的{{3}},concat
删除列的DataFrame
构造函数:
df
编辑:
import ast
df.message = df.message.replace([':\s+,','\[', '\]', ':\s+', ',\s+'],
['":"none","', '{"', '"}', '":"', '","'], regex=True)
df.message = df.message.apply(ast.literal_eval)
df1 = pd.DataFrame(df.pop('message').values.tolist(), index=df.index)
print (df1)
kids money group job money wife
0 NaN none NaN none NaN none
1 NaN NaN band NaN 10000 yes
2 one NaN jail none none none
df = pd.concat([df, df1], axis=1)
print (df)
name status number kids money group job money wife
0 matt active 12345 NaN none NaN none NaN none
1 james active 23456 NaN NaN band NaN 10000 yes
2 adam inactive 34567 one NaN jail none none none
的另一种解决方案:
yaml
答案 1 :(得分:1)
你把它标记为一个列表但是说它是一个字典所以这应该有用:
pd.concat([data.drop(['message'], axis=1), data['message'].apply(pd.Series)], axis=1)