将multiindex系列数据透视表转换为DataFrame

时间:2018-06-29 06:22:14

标签: python pandas

通过如下所示的初始多索引DataFrame:

import numpy as np
import pandas as pd
arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])]
df = pd.DataFrame(np.random.randn(8, 4), index=arrays)
print(df)
0         1         2         3
bar one -1.111899 -0.673956 -0.045719 -0.654951
    two  0.761249  1.009988  1.718598  1.461674
baz one -1.128029  0.360159 -0.004877 -0.725785
    two -0.007996  1.183093  1.651100 -1.408199
foo one  0.935349  0.816100  1.043749 -0.575600
    two  0.986057  0.790675 -0.302731  1.434262
qux one  0.564661 -2.821966  0.650187 -0.176112
    two -1.353135  0.192120 -0.314343 -1.242303

我只需要按以下方式提取的第一列:

series = df[0]
print(series)
bar  one   -1.111899
     two    0.761249
baz  one   -1.128029
     two   -0.007996
foo  one    0.935349
     two    0.986057
qux  one    0.564661
     two   -1.3531354
type(series)
<class 'pandas.core.series.Series'>

如何将本系列应用于以下DataFrame:

       bar       baz        foo       qux
one   -1.111899  -1.128029  0.935349  0.564661
two    0.761249  -0.007996  0.986057  -1.353135

请注意,我不坚持中间的第二步。获取结果DataFrame仅重要。

1 个答案:

答案 0 :(得分:1)

您需要在第一级添加unstack

np.random.seed(123)
arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),
          np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])]
df = pd.DataFrame(np.random.randn(8, 4), index=arrays)
print (df)
                0         1         2         3
bar one -1.085631  0.997345  0.282978 -1.506295
    two -0.578600  1.651437 -2.426679 -0.428913
baz one  1.265936 -0.866740 -0.678886 -0.094709
    two  1.491390 -0.638902 -0.443982 -0.434351
foo one  2.205930  2.186786  1.004054  0.386186
    two  0.737369  1.490732 -0.935834  1.175829
qux one -1.253881 -0.637752  0.907105 -1.428681
    two -0.140069 -0.861755 -0.255619 -2.798589

df1 = df[0].unstack(level=0)
print (df1)
          bar       baz       foo       qux
one -1.085631  1.265936  2.205930 -1.253881
two -0.578600  1.491390  0.737369 -0.140069

另一种解决方案是首先为列中的unstack MultiIndex,然后按DataFrame.xs进行选择:

df1 = df.unstack(level=0)
print (df1)
            0                                       1                      \
          bar       baz       foo       qux       bar       baz       foo   
one -1.085631  1.265936  2.205930 -1.253881  0.997345 -0.866740  2.186786   
two -0.578600  1.491390  0.737369 -0.140069  1.651437 -0.638902  1.490732   

                      2                                       3            \
          qux       bar       baz       foo       qux       bar       baz   
one -0.637752  0.282978 -0.678886  1.004054  0.907105 -1.506295 -0.094709   
two -0.861755 -2.426679 -0.443982 -0.935834 -0.255619 -0.428913 -0.434351   


          foo       qux  
one  0.386186 -1.428681  
two  1.175829 -2.798589 

#more general solution
df2 = df1.xs(0, level=0, axis=1)
#if need seelct first level only
#df2 = df1[0]
print (df2)
          bar       baz       foo       qux
one -1.085631  1.265936  2.205930 -1.253881
two -0.578600  1.491390  0.737369 -0.140069