融化多指标熊猫保留2列

时间:2019-11-15 18:31:08

标签: python pandas stack pivot melt

df有:

 Index0       Workstream A    Workstream A
 Index 1        Section A        Section B
 Index 2      Start    End     Start   End
  ABC          2010   2011      2012   2013

df要:

Model Workstream Section Start End
ABC     A          A      2010 2011
ABC     A          B      2012 2013

尝试了df.melt(),但它没有正确地取消多索引列的显示


设置

midx = pd.MultiIndex.from_product([
    ['Workstream A'], ['Section A', 'Section B'], ['Start', 'End']
], names=['Index 0', 'Index 1', 'Index 2'])

df = pd.DataFrame([['2010', '2011', '2012', '2013']], ['ABC'], midx)

with pd.option_context('display.multi_sparse', False):
    print(df)

Index 0 Workstream A Workstream A Workstream A Workstream A
Index 1    Section A    Section A    Section B    Section B
Index 2        Start          End        Start          End
ABC             2010         2011         2012         2013

3 个答案:

答案 0 :(得分:1)

stack

(df.stack(level=[0,1])
   .reset_index(level=[-1,-2])
)

输出(您可以轻松地相应地重命名):

Index 2       Index 0    Index 1   End Start
ABC      Workstream A  Section A  2011  2010
ABC      Workstream A  Section B  2013  2012

具有重命名

def strip_col_name(c):
    """Strip out the column name from the values"""
    return c.str.replace(f'{c.name}\s*', '')

(
    df.stack([0, 1])
      .rename_axis(['Model', 'Workstream', 'Section'])
      .reset_index()
      .apply(strip_col_name)
)

Index 2 Model Workstream Section   End Start
0         ABC          A       A  2011  2010
1         ABC          A       B  2013  2012

答案 1 :(得分:0)

大多数工作涉及操纵构成列标题的MultiIndex

如果您有选择,请将固定标签移到关卡名称中,以简化生活:

midx = pd.MultiIndex.from_product([
    ['A'], ['A', 'B'], ['Start', 'End']
], names=['Workstation', 'Section', None])

如果不这样做,请将它们操纵为相同的想法:

midx = []
for level, name in enumerate(['Workstream', 'Section']):
    idx = df.columns.get_level_values(level) \
                .str.replace(name, '') \
                .str.strip() \
                .rename(name)
    midx.append(idx)

midx.append(df.columns.get_level_values(2).rename(None))
df.columns = pd.MultiIndex.from_arrays(midx)

之后,只需两行代码即可获取所需的数据框:

df.index.name = 'Model'
df.stack(level=[0,1]).reset_index()

结果:

  Model Workstream Section   End Start
0   ABC          A       A  2011  2010
1   ABC          A       B  2013  2012

根据需要对列进行重新排序。

答案 2 :(得分:0)

TL; DR

使用Quang Hoang's Solution


pandas.wide_to_long

我很失望,这并不像我想象的那么简单。公平地说,起始数据框也不是直接简单。

我很想直接使用pandas.wide_to_long。最终,我不得不对此进行分解,并在多个迭代中使用wide_to_long

我不建议这样做!我只是想进行一下练习。


kw        = dict(sep=' ', suffix='\\w+')      # I have to use a different suffix from the default
i0        = ['Model', 'Index 1', 'Index 2']   # I'm prepping lists for use in my function calls
i1        = ['Model', 'Index 2', 'Ws']

df_0      = df.rename_axis('Model')           # Rename the axis so that it has the appropriate name
stacked_0 = df_0.stack([1, 2]).reset_index() 

stacked_0

Index 0 Model    Index 1 Index 2 Workstream A
0         ABC  Section A     End         2011
1         ABC  Section A   Start         2010
2         ABC  Section B     End         2013
3         ABC  Section B   Start         2012

第一次使用wide_to_long

的时间
df_1      = pd.wide_to_long(stacked_0, stubnames='Workstream', i=i0, j='Ws', **kw)

df_1

                           Workstream
Model Index 1   Index 2 Ws           
ABC   Section A End     A        2011
                Start   A        2010
      Section B End     A        2013
                Start   A        2012

stacked_1 = df_1['Workstream'].unstack('Index 1').reset_index()

stacked_1

Index 1 Model Index 2 Ws Section A Section B
0         ABC     End  A      2011      2013
1         ABC   Start  A      2010      2012

df_3      = pd.wide_to_long(stacked_1, stubnames='Section', i=i1, j='Sec', **kw)

df_3

                     Section
Model Index 2 Ws Sec        
ABC   End     A  A      2011
                 B      2013
      Start   A  A      2010
                 B      2012

df_3['Section'].unstack('Index 2').reset_index()

Index 2 Model Ws Sec   End Start
0         ABC  A   A  2011  2010
1         ABC  A   B  2013  2012
3

在一起

kw        = dict(sep=' ', suffix='\\w+')
i0        = ['Model', 'Index 1', 'Index 2']
i1        = ['Model', 'Index 2', 'Ws']

df_0      = df.rename_axis('Model')
stacked_0 = df_0.stack([1, 2]).reset_index()

df_1      = pd.wide_to_long(stacked_0, stubnames='Workstream', i=i0, j='Ws', **kw)
stacked_1 = df_1['Workstream'].unstack('Index 1').reset_index()

df_3      = pd.wide_to_long(stacked_1, stubnames='Section', i=i1, j='Sec', **kw)
df_3['Section'].unstack('Index 2').reset_index()