我有一个有趣的问题,我想知道是否有一种简洁,pythonic(pandastic?)方式来做到这一点,而不是迭代数据帧的行。
使用一个字段来获取DataFrame,该字段是信息的json编码:
Name Data
0 Joe '[{"label":"a","value":"1"},{"label":"b","value":"2"}]'
1 Sue '[{"label":"a","value":"3"},{"label":"c","value":"4"}]'
2 Bob '[{"label":"b","value":"4"},{"label":"d","value":"1"}]'
我想将json字段扩展为数据字段,并将不同的列标题合并,以获得此结果:
Name Data a b c d
0 Joe '[{"label":"a"... 1 2
1 Sue '[{"label":"a"... 3 4
2 Bob '[{"label":"b"... 4 1
空白缺少值。我知道我可以使用read_json从json字段创建数据帧,但是我想将这些数据帧重新展平为原始数据集的额外列。
那么,有没有一种优雅的方法可以在不迭代数据帧的各行的情况下执行此操作?任何帮助将不胜感激。
答案 0 :(得分:9)
给出
In [96]: df
Out[96]:
Name Data
0 Joe [{"a":"1"},{"b":"2"}]
1 Sue [{"a":"3"},{"c":"4"}]
2 Bob [{"b":"4"},{"d":"1"}]
如果你定义
import json
def json_to_series(text):
keys, values = zip(*[item for dct in json.loads(text) for item in dct.items()])
return pd.Series(values, index=keys)
然后
In [97]: result = pd.concat([df, df['Data'].apply(json_to_series)], axis=1)
In [98]: result
Out[98]:
Name Data a b c d
0 Joe [{"a":"1"},{"b":"2"}] 1 2 NaN NaN
1 Sue [{"a":"3"},{"c":"4"}] 3 NaN 4 NaN
2 Bob [{"b":"4"},{"d":"1"}] NaN 4 NaN 1
给出
In [22]: df
Out[22]:
Name Data
0 Joe [{"label":"a","value":"1"},{"label":"b","value...
1 Sue [{"label":"a","value":"3"},{"label":"c","value...
2 Bob [{"label":"b","value":"4"},{"label":"d","value...
如果你定义
def json_to_series(text):
keys, values = zip(*[(dct['label'], dct['value']) for dct in json.loads(text)])
return pd.Series(values, index=keys)
然后
In [20]: result = pd.concat([df, df['Data'].apply(json_to_series)], axis=1)
In [21]: result
Out[21]:
Name Data a b c d
0 Joe [{"label":"a","value":"1"},{"label":"b","value... 1 2 NaN NaN
1 Sue [{"label":"a","value":"3"},{"label":"c","value... 3 NaN 4 NaN
2 Bob [{"label":"b","value":"4"},{"label":"d","value... NaN 4 NaN 1
参考文献: