如何为Pandas数据框的给定列中的每个唯一组件创建一个新列?

时间:2020-04-15 21:17:11

标签: python python-3.x pandas

我对Pandas来说还比较陌生,因此,如果问题提出不当,我深表歉意。我有以下数据框

df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar',
                         'foo', 'bar', 'foo', 'foo'],
                   'B': ['one', 'one', 'two', 'three',
                         'two', 'two', 'one', 'three'],
                   'C': np.random.randn(8)})



     A      B         C         
0  foo    one  0.469112 
1  bar    one -0.282863 
2  foo    two -1.509059
3  bar  three -1.135632  
4  foo    two  1.212112  
5  bar    two -0.173215 
6  foo    one  0.119209 
7  foo  three -1.044236 

我要实现的目标是

           foo_B         foo_C      bar_B      bar_C          
0             one        0.469112     -           -
1             -            -          one        -0.282863 
2             two        -1.509059    -            -
3             -               -       three    -1.135632               
4             two         1.212112    -            -
5              -              -       two      -0.173215 
6             one         0.119209      -           -
7              three     -1.044236      -           -

我完全不知道使用哪个熊猫函数来获得这样的结果。请帮助

1 个答案:

答案 0 :(得分:5)

您可以使用set_index来完成A列,并使用append=Trueunstack来保留原始索引。然后根据需要在输出中重命名列。

df_f = df.set_index('A', append=True).unstack()
df_f.columns = [f'{col[1]}_{col[0]}' for col in df_f.columns]
print (df_f)
   bar_B  foo_B     bar_C     foo_C
0    NaN    one       NaN -0.230467
1    one    NaN  0.230529       NaN
2    NaN    two       NaN  1.633847
3  three    NaN -0.307068       NaN
4    NaN    two       NaN  0.130438
5    two    NaN  0.459630       NaN
6    NaN    one       NaN -0.791269
7    NaN  three       NaN  0.016670