df有:
Index0 Workstream A Workstream A
Index 1 Section A Section B
Index 2 Start End Start End
ABC 2010 2011 2012 2013
df要:
Model Workstream Section Start End
ABC A A 2010 2011
ABC A B 2012 2013
尝试了df.melt(),但它没有正确地取消多索引列的显示
midx = pd.MultiIndex.from_product([
['Workstream A'], ['Section A', 'Section B'], ['Start', 'End']
], names=['Index 0', 'Index 1', 'Index 2'])
df = pd.DataFrame([['2010', '2011', '2012', '2013']], ['ABC'], midx)
with pd.option_context('display.multi_sparse', False):
print(df)
Index 0 Workstream A Workstream A Workstream A Workstream A
Index 1 Section A Section A Section B Section B
Index 2 Start End Start End
ABC 2010 2011 2012 2013
答案 0 :(得分:1)
stack
:
(df.stack(level=[0,1])
.reset_index(level=[-1,-2])
)
输出(您可以轻松地相应地重命名):
Index 2 Index 0 Index 1 End Start
ABC Workstream A Section A 2011 2010
ABC Workstream A Section B 2013 2012
def strip_col_name(c):
"""Strip out the column name from the values"""
return c.str.replace(f'{c.name}\s*', '')
(
df.stack([0, 1])
.rename_axis(['Model', 'Workstream', 'Section'])
.reset_index()
.apply(strip_col_name)
)
Index 2 Model Workstream Section End Start
0 ABC A A 2011 2010
1 ABC A B 2013 2012
答案 1 :(得分:0)
大多数工作涉及操纵构成列标题的MultiIndex
。
如果您有选择,请将固定标签移到关卡名称中,以简化生活:
midx = pd.MultiIndex.from_product([
['A'], ['A', 'B'], ['Start', 'End']
], names=['Workstation', 'Section', None])
如果不这样做,请将它们操纵为相同的想法:
midx = []
for level, name in enumerate(['Workstream', 'Section']):
idx = df.columns.get_level_values(level) \
.str.replace(name, '') \
.str.strip() \
.rename(name)
midx.append(idx)
midx.append(df.columns.get_level_values(2).rename(None))
df.columns = pd.MultiIndex.from_arrays(midx)
之后,只需两行代码即可获取所需的数据框:
df.index.name = 'Model'
df.stack(level=[0,1]).reset_index()
结果:
Model Workstream Section End Start
0 ABC A A 2011 2010
1 ABC A B 2013 2012
根据需要对列进行重新排序。
答案 2 :(得分:0)
pandas.wide_to_long
我很失望,这并不像我想象的那么简单。公平地说,起始数据框也不是直接简单。
我很想直接使用pandas.wide_to_long
。最终,我不得不对此进行分解,并在多个迭代中使用wide_to_long
。
我不建议这样做!我只是想进行一下练习。
kw = dict(sep=' ', suffix='\\w+') # I have to use a different suffix from the default
i0 = ['Model', 'Index 1', 'Index 2'] # I'm prepping lists for use in my function calls
i1 = ['Model', 'Index 2', 'Ws']
df_0 = df.rename_axis('Model') # Rename the axis so that it has the appropriate name
stacked_0 = df_0.stack([1, 2]).reset_index()
stacked_0
Index 0 Model Index 1 Index 2 Workstream A
0 ABC Section A End 2011
1 ABC Section A Start 2010
2 ABC Section B End 2013
3 ABC Section B Start 2012
第一次使用wide_to_long
df_1 = pd.wide_to_long(stacked_0, stubnames='Workstream', i=i0, j='Ws', **kw)
df_1
Workstream
Model Index 1 Index 2 Ws
ABC Section A End A 2011
Start A 2010
Section B End A 2013
Start A 2012
stacked_1 = df_1['Workstream'].unstack('Index 1').reset_index()
stacked_1
Index 1 Model Index 2 Ws Section A Section B
0 ABC End A 2011 2013
1 ABC Start A 2010 2012
df_3 = pd.wide_to_long(stacked_1, stubnames='Section', i=i1, j='Sec', **kw)
df_3
Section
Model Index 2 Ws Sec
ABC End A A 2011
B 2013
Start A A 2010
B 2012
df_3['Section'].unstack('Index 2').reset_index()
Index 2 Model Ws Sec End Start
0 ABC A A 2011 2010
1 ABC A B 2013 2012
3
在一起
kw = dict(sep=' ', suffix='\\w+')
i0 = ['Model', 'Index 1', 'Index 2']
i1 = ['Model', 'Index 2', 'Ws']
df_0 = df.rename_axis('Model')
stacked_0 = df_0.stack([1, 2]).reset_index()
df_1 = pd.wide_to_long(stacked_0, stubnames='Workstream', i=i0, j='Ws', **kw)
stacked_1 = df_1['Workstream'].unstack('Index 1').reset_index()
df_3 = pd.wide_to_long(stacked_1, stubnames='Section', i=i1, j='Sec', **kw)
df_3['Section'].unstack('Index 2').reset_index()