熊猫,将MultiIndex数据框的某些列转换为行

时间:2020-03-27 02:45:27

标签: pandas multi-index

我有一个这样的具有MultiIndex的数据框:

col_type1  col_type2   col_a           col_b           colc
col_type1  col_type2   col_x  col_y    col_x col_y    col_x  col_y
type11     type21      10     100      11    101      12     102
type12     type22      20     200      21    201      22     202
type13     type23      30     300      31    301      32     302

这是创建数据框的代码:

pd.DataFrame.from_dict(
{('col_type1', 'col_type1'): {0: 'type11', 1: 'type12', 2: 'type13'},
 ('col_type2', 'col_type2'): {0: 'type21', 1: 'type22', 2: 'type23'},
 ('col_a', 'col_x'): {0: '10', 1: '20', 2: '30'},
 ('col_a', 'col_y'): {0: '100', 1: '200', 2: '300'},
 ('col_b', 'col_x'): {0: '11', 1: '21', 2: '31'},
 ('col_b', 'col_y'): {0: '101', 1: '201', 2: '301'},
 ('col_c', 'col_x'): {0: '12', 1: '22', 2: '32'},
 ('col_c', 'col_y'): {0: '102', 1: '202', 2: '302'}})

我想将此数据帧熔化为这种格式,保留col_type1col_type2并将第一级列转换为行:

col_type1 col_type2  col_convert col_x  col_y
type11    type21     col_a       10     100  
type11    type21     col_b       11     101
type11    type21     col_c       12     102
type12    type22     col_a       20     200  
type12    type22     col_b       21     201
type12    type22     col_c       22     202
type13    type23     col_a       30     300  
type13    type23     col_b       31     301
type13    type23     col_c       32     302

我尝试了melt(),此方法可以设置col_level

但是当我将其设置为0时,它将失去级别1

当我将其设置为1时,它将失去级别0

我尝试了unstack(),此方法无法设置类似col_level的内容,

我必须先过滤type1并删除列col_type

然后将数据unstack两次,然后将col_type附加为type1

type2type3 ...

有更好的方法吗?

1 个答案:

答案 0 :(得分:1)

已更新:

df = pd.DataFrame.from_dict(
{('col_type1', 'col_type1'): {0: 'type11', 1: 'type12', 2: 'type13'},
 ('col_type2', 'col_type2'): {0: 'type21', 1: 'type22', 2: 'type23'},
 ('col_a', 'col_x'): {0: '10', 1: '20', 2: '30'},
 ('col_a', 'col_y'): {0: '100', 1: '200', 2: '300'},
 ('col_b', 'col_x'): {0: '11', 1: '21', 2: '31'},
 ('col_b', 'col_y'): {0: '101', 1: '201', 2: '301'},
 ('col_c', 'col_x'): {0: '12', 1: '22', 2: '32'},
 ('col_c', 'col_y'): {0: '102', 1: '202', 2: '302'}})

df.set_index([('col_type1', 'col_type1'),('col_type2', 'col_type2')])\
  .stack(0)\
  .reset_index()\
  .rename(columns={('col_type1', 'col_type1'):'col_type1',
                   ('col_type2', 'col_type2'):'col_type2',
                   'level_2':'col_convert'})

输出:

  col_type1 col_type2 col_convert col_x col_y
0    type11    type21       col_a    10   100
1    type11    type21       col_b    11   101
2    type11    type21       col_c    12   102
3    type12    type22       col_a    20   200
4    type12    type22       col_b    21   201
5    type12    type22       col_c    22   202
6    type13    type23       col_a    30   300
7    type13    type23       col_b    31   301
8    type13    type23       col_c    32   302

尝试,将multiindex列的级别0堆叠:

df.stack(0).reset_index()

输出:

       0 level_1  col_x  col_y
0  type1   col_a     10    100
1  type1   col_b     11    101
2  type1   col_c     12    102
3  type2   col_a     20    200
4  type2   col_b     21    201
5  type2   col_c     22    202
6  type3   col_a     30    300
7  type3   col_b     31    301
8  type3   col_c     32    302