我对Pandas来说还比较陌生,因此,如果问题提出不当,我深表歉意。我有以下数据框
df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar',
'foo', 'bar', 'foo', 'foo'],
'B': ['one', 'one', 'two', 'three',
'two', 'two', 'one', 'three'],
'C': np.random.randn(8)})
A B C
0 foo one 0.469112
1 bar one -0.282863
2 foo two -1.509059
3 bar three -1.135632
4 foo two 1.212112
5 bar two -0.173215
6 foo one 0.119209
7 foo three -1.044236
我要实现的目标是
foo_B foo_C bar_B bar_C
0 one 0.469112 - -
1 - - one -0.282863
2 two -1.509059 - -
3 - - three -1.135632
4 two 1.212112 - -
5 - - two -0.173215
6 one 0.119209 - -
7 three -1.044236 - -
我完全不知道使用哪个熊猫函数来获得这样的结果。请帮助
答案 0 :(得分:5)
您可以使用set_index
来完成A列,并使用append=True
和unstack
来保留原始索引。然后根据需要在输出中重命名列。
df_f = df.set_index('A', append=True).unstack()
df_f.columns = [f'{col[1]}_{col[0]}' for col in df_f.columns]
print (df_f)
bar_B foo_B bar_C foo_C
0 NaN one NaN -0.230467
1 one NaN 0.230529 NaN
2 NaN two NaN 1.633847
3 three NaN -0.307068 NaN
4 NaN two NaN 0.130438
5 two NaN 0.459630 NaN
6 NaN one NaN -0.791269
7 NaN three NaN 0.016670